Closed blagasz closed 2 years ago
Hi. Well, the issue here is really the ordering of generated fields - if you have a field A that is dependent on field B's value, this will not work out of the box currently. The library iterates the __fields__
property of the pydantic model, which has an order that derives from the order of declaration of the fields. Thus, we cannot know for certain that field B will be processed before field A, and so to implement this will require some thinking on how to do this in a none-hacky way.
So, before this enhancement can proceed we need to have a clear viable way of implementing this in a way that can be developed further. I invite you to add your ideas.
Hi. I totally agree, I was also worried about the order. I think it is enough for all the realistic use cases I can think of that every field could only depend on a preceding one (it is also a good way to avoid circularities as an unwanted side effect). So I think it is completely satisfactory to somehow allow access for already generated fields and only those.
I am still unsure about the right syntax and need to familiarise myself more with the internal workings of the generation process, but I will try to make a v1 PR and we could iterate on that if you agree.
sure go ahead.
Also, would be happy for some further input @lindycoder
I do not have a strong opinion on this. I usually try to make data models that don't have interdependent fields because it's always a pain to generate data from.
If one field is generated from the others i would vote for a @property
. If it's correlated with others and you're using pydantic i would make validators or root validators enforcing it.... but that doesn't solve the problem of the data generation.
In the case you're describing i would probably hide the factory.build() in a function that will control the input to the .build().
Regarding the feature/implementation:
Pydantic field validators are processed in order and the values you have access while validating are the ones that have been described before. As @Goldziher pointed out, generation is done iterating over __fields__
, probably in the same fashion pydantic does its validator. If we could accumulate a dict[str, Any]
of generated data and pass this as a parameter to all generative methods that would be work like pydantic, which might make sense to pydantic users.
I think it would be cool if handle_factory_field
passed that to Use() and callable field definition, but that would be a backward incompatible change. Unless we check if the function takes args before sending it... kinda like pytest checks for fixtures by arg name. For example, if the callable has a "values" (name of param in validators) param we could supply it.
I completely agree that this use case is best to be avoided. On the other hand it could add a nice extra to the package. Adding this option to Use though seems straightforward I would not go with the arg inspection option to avoid backward incompatible change. I suggest something like:
class MyFactory(ModelFactory):
__model__ = MyModel
@generate('special_field')
def type_specific_generator(self, v, values: dict[str, Any])
# do the magic (??? provide faker instance to help generation)
return generated value
It is a second way of doing the same thing which is not ideal, but this way backward incompatibility could be avoided. It should throw and error if the same field has a Use
specification as well. Wdyt?
well, so far we didnt use decorators in this package. its not a strict no, but I would prefer keeping the API logic the same.
How about, instead of modifying Use
, we add a new class like Use
, Ignore
, etc, e.g. something like After
or Depends
, and then pass the data to it? What we could do is basically run the generation for all fields, but after the values have been generated, pass the result to any field that has After
and then take the resulting value as the final value.
I like this, just to be sure, I see something like this in my head:
class MyFactory(ModelFactory):
__model__ = MyModel
dependant_field = Depends('field_name or list of names', callable_receiving_dependant_values, args, kwargs)
And the the value returned by the callable would be the final value for dependant_field
.
Just throwing that here
class MyFactory(ModelFactory):
__model__ = MyModel
dependant_field = PostGenerated(callable_receiving_dependant_values, *args, **kwargs)
And all post generated fields would be triggered at the end of everything else, in no particular order.
If you introduce the direct dependency, we need to change the iteration of the __fields__
and switch to a dependency tree because then you would need to support depends on depends... seems like complexity i'm not sure we want to support
Fair point, it could even become circular I guess
@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.
Note that this feature could still be relevant for people that likes immutability and makes their classes frozen=True
as you can't change them after build()
. Not sure how much of the audience that applies to.
@Goldziher, @lindycoder, I was away trying to accomplish what triggered this idea in the first place and I realised that it is almost inevitable to do custom post processing over the model factories if I want to consistently generate a complex database with meaningful cross connections in the data. It could be easily accomplished without the suggested feature by iterating through the generated models and directly modifying attributes before persisting them. So I suppose to drop this feature as unnecessary.
Ok, for me simple is better and this feature is borderline too complex, although I'm not opposed to it. Feel free to close this issue unless you intend to contribute it.
I contributed a first version, any feedback is welcome!
I contributed a first version, any feedback is welcome!
very nice stuff. I reviewed
Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.
Would this be a good case for SQL relationships ? For example in SQLModel:
class Hero(SQLModel, table=True):
...
team_id: Optional[int] = Field(default=None, foreign_key="team.id")
team: Optional[Team] = Relationship(back_populates="heroes")
How would we make sure that team_id
and team.id
are both the same?
In factory-boy this could be accomplished with SubFactory
and SelfAttribute
, something like this:
class HeroFactory(SQLAlchemyModelFactory):
...
team_id = FuzzyInteger()
team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")
Does this issue intends to support the same use cases? Is there a feature like SubFactory
in pydantic_factories
, where we can override certain fields of nested models ?
Hi, thanks for this library! In my team we are currently using factory-boy and considering to start using this library.
Would this be a good case for SQL relationships ? For example in SQLModel:
class Hero(SQLModel, table=True): ... team_id: Optional[int] = Field(default=None, foreign_key="team.id") team: Optional[Team] = Relationship(back_populates="heroes")
How would we make sure that
team_id
andteam.id
are both the same?In factory-boy this could be accomplished with
SubFactory
andSelfAttribute
, something like this:class HeroFactory(SQLAlchemyModelFactory): ... team_id = FuzzyInteger() team = SubFactory(TeamFactory, team_id=SelfAttribute("..team_id")
Does this issue intends to support the same use cases? Is there a feature like
SubFactory
inpydantic_factories
, where we can override certain fields of nested models ?
Hi there,
You could certainly use it like so I think. Please open a separate issue after this feature is released if it does not satisfy your requirements.
This feature has been released in v1.3.0 @blagasz. Thanks for your contribution!
It would be a nice feature to have access to the already generated field values in a customised generator scenario, e.g.
I could even imagine some decorator or annotation based solution to the same problem, e.g.