Feature: access already generated fields in custom generation

blagasz commented 2 years ago

It would be a nice feature to have access to the already generated field values in a customised generator scenario, e.g.

from typing import Any
from pydantic_factories import ModelFactory

class CustomFactory(ModelFactory[Any]):
    """Tweak the ModelFactory to add our custom mocks."""

    @classmethod
    def get_mock_value(cls, field_type: Any, previous_fields: dict[str, Any]) -> Any:
        """Add our custom mock value."""
        if str(field_type) == "my_dependant_field" and previous_fields["my_relying_on"] == 'special_value':
            return cls._get_faker().date_time_between()

        return super().get_mock_value(field_type)

I could even imagine some decorator or annotation based solution to the same problem, e.g.

class MyFactory(ModelFactory):
  __model__ = MyModel

  @depends('field_relying_on')
  def dependant_field_name(self, current_value: Any, other_value: Any):
    return 'special_value_generated_based_on_context'

Goldziher commented 2 years ago

Hi. Well, the issue here is really the ordering of generated fields - if you have a field A that is dependent on field B's value, this will not work out of the box currently. The library iterates the __fields__ property of the pydantic model, which has an order that derives from the order of declaration of the fields. Thus, we cannot know for certain that field B will be processed before field A, and so to implement this will require some thinking on how to do this in a none-hacky way.

So, before this enhancement can proceed we need to have a clear viable way of implementing this in a way that can be developed further. I invite you to add your ideas.

blagasz commented 2 years ago

Hi. I totally agree, I was also worried about the order. I think it is enough for all the realistic use cases I can think of that every field could only depend on a preceding one (it is also a good way to avoid circularities as an unwanted side effect). So I think it is completely satisfactory to somehow allow access for already generated fields and only those.

I am still unsure about the right syntax and need to familiarise myself more with the internal workings of the generation process, but I will try to make a v1 PR and we could iterate on that if you agree.

Goldziher commented 2 years ago

sure go ahead.

Also, would be happy for some further input @lindycoder

lindycoder commented 2 years ago

I do not have a strong opinion on this. I usually try to make data models that don't have interdependent fields because it's always a pain to generate data from.

If one field is generated from the others i would vote for a @property. If it's correlated with others and you're using pydantic i would make validators or root validators enforcing it.... but that doesn't solve the problem of the data generation.

In the case you're describing i would probably hide the factory.build() in a function that will control the input to the .build().

Regarding the feature/implementation: Pydantic field validators are processed in order and the values you have access while validating are the ones that have been described before. As @Goldziher pointed out, generation is done iterating over __fields__, probably in the same fashion pydantic does its validator. If we could accumulate a dict[str, Any] of generated data and pass this as a parameter to all generative methods that would be work like pydantic, which might make sense to pydantic users.

I think it would be cool if handle_factory_field passed that to Use() and callable field definition, but that would be a backward incompatible change. Unless we check if the function takes args before sending it... kinda like pytest checks for fixtures by arg name. For example, if the callable has a "values" (name of param in validators) param we could supply it.

blagasz commented 2 years ago

I completely agree that this use case is best to be avoided. On the other hand it could add a nice extra to the package. Adding this option to Use though seems straightforward I would not go with the arg inspection option to avoid backward incompatible change. I suggest something like:


class MyFactory(ModelFactory):
  __model__ = MyModel

  @generate('special_field')
  def type_specific_generator(self, v, values: dict[str, Any])
     # do the magic (??? provide faker instance to help generation)
    return generated value

It is a second way of doing the same thing which is not ideal, but this way backward incompatibility could be avoided. It should throw and error if the same field has a Use specification as well. Wdyt?

Goldziher commented 2 years ago

well, so far we didnt use decorators in this package. its not a strict no, but I would prefer keeping the API logic the same.

How about, instead of modifying Use, we add a new class like Use, Ignore, etc, e.g. something like After or Depends, and then pass the data to it? What we could do is basically run the generation for all fields, but after the values have been generated, pass the result to any field that has After and then take the resulting value as the final value.

blagasz commented 2 years ago

I like this, just to be sure, I see something like this in my head:

class MyFactory(ModelFactory):
  __model__ = MyModel

  dependant_field = Depends('field_name or list of names', callable_receiving_dependant_values, args, kwargs)

And the the value returned by the callable would be the final value for dependant_field.

lindycoder commented 2 years ago

Just throwing that here

class MyFactory(ModelFactory):
  __model__ = MyModel

  dependant_field = PostGenerated(callable_receiving_dependant_values, *args, **kwargs)

And all post generated fields would be triggered at the end of everything else, in no particular order.

If you introduce the direct dependency, we need to change the iteration of the __fields__ and switch to a dependency tree because then you would need to support depends on depends... seems like complexity i'm not sure we want to support

blagasz commented 2 years ago

Fair point, it could even become circular I guess

blagasz commented 2 years ago

@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.

lindycoder commented 2 years ago

Note that this feature could still be relevant for people that likes immutability and makes their classes frozen=True as you can't change them after build(). Not sure how much of the audience that applies to.

Goldziher commented 2 years ago

@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.

Ok, for me simple is better and this feature is borderline too complex, although I'm not opposed to it. Feel free to close this issue unless you intend to contribute it.

blagasz commented 2 years ago

I contributed a first version, any feedback is welcome!

Goldziher commented 2 years ago

I contributed a first version, any feedback is welcome!

very nice stuff. I reviewed

DiegoPiloni commented 2 years ago

Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.

Would this be a good case for SQL relationships ? For example in SQLModel:

class Hero(SQLModel, table=True):
    ...
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")
    team: Optional[Team] = Relationship(back_populates="heroes")

How would we make sure that team_id and team.id are both the same?

In factory-boy this could be accomplished with SubFactory and SelfAttribute, something like this:

class HeroFactory(SQLAlchemyModelFactory):
   ...
   team_id = FuzzyInteger()
   team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")

Does this issue intends to support the same use cases? Is there a feature like SubFactory in pydantic_factories, where we can override certain fields of nested models ?

Goldziher commented 2 years ago

Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.

Would this be a good case for SQL relationships ? For example in SQLModel:
class Hero(SQLModel, table=True):
    ...
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")
    team: Optional[Team] = Relationship(back_populates="heroes")
How would we make sure that team_id and team.id are both the same?

In factory-boy this could be accomplished with SubFactory and SelfAttribute, something like this:
class HeroFactory(SQLAlchemyModelFactory):
   ...
   team_id = FuzzyInteger()
   team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")
Does this issue intends to support the same use cases? Is there a feature like SubFactory in pydantic_factories, where we can override certain fields of nested models ?

Hi there,

You could certainly use it like so I think. Please open a separate issue after this feature is released if it does not satisfy your requirements.

Goldziher commented 2 years ago

This feature has been released in v1.3.0 @blagasz. Thanks for your contribution!

litestar-org / polyfactory

Feature: access already generated fields in custom generation #41