Closed sydney-runkle closed 2 weeks ago
Comparing union-val-fix
(ed8ddb9) with main
(46379ac)
✅ 155
untouched benchmarks
This happens to fix https://github.com/pydantic/pydantic/issues/8690, but really this belongs in the "want to fix" category below, as we need to account for defaults more carefully.
Interestingly, after writing this PR, I noticed that a similar suggestion for a fix was made in https://github.com/pydantic/pydantic/issues/9095. We don't exactly use model_fields_set
because that hasn't been initialized at that time, but that idea is the same.
There's still more to be done here, specifically, we can fix the below issues if we address correctness of a match based on model fields set, and not defaults used.
I do think that issues with default handling should also be addressed in this PR, because without that, existing test cases are failing with the current naive addition.
It would be exciting to close 9 issues with one PR 😂. This speaks to both our lack of ability to close duplicates, but also is helpful in terms of inspiration for a variety of test cases!
Thus, I plan tomorrow to experiment with the __pydantic_fields_set__
data, and experiment with how to extract that info for dataclasses and typeddicts as well :). Right now we've handled the square, but need to address the rectangle.
Another follow up question I have: would it make sense to externally manage the 'scoring' associated with various union matches, instead of updating the best match with every new attempt? I don't think the performance will be much different, but we should consider readability.
Want to fix:
Update, I believe I have now fixed the following:
That being said, I'd definitely be open to reworking this design in general. Specifically, I'm wondering:
num_fields
method to the various validators in question? I'm unsure of a better way to approach this given that the validator
property is private, but I'm guessing there's a better way to do this lookup on a validator instance.__pydantic_fields_set__
attribute value in pydantic-core
, but we fallback to num_fields()
in the case of this getattr
failing, so maybe it's ok?Things I still definitely need to do here:
It's really exciting to see progress in this area!
🎈
So I will be honest and say that I'm quite scared about all possible solutions in this space and how they generalise to meet every users' expectations.
Especially the more complex we make this logic, the more we risk that users will find edge cases which don't work as they hope and we have to add even more complexity to this process to meet their needs. Where does the line get drawn, and how do we document this process to users? At the moment smart mode unions are very opaque and I think we've permitted ourselves to adjust their behaviour in future releases as long as we believe the new behaviour is better, but it's a fine line.
Yep, this all makes sense to me. I think that because we have some flexibility with what "smart" means, we can make this change. Additionally, users who have specialized cases that behave differently with this change can get the old behavior by using left-to-right
mode. I've also tried to preserve some left-to-right
behavior by only overriding the current union member match if the fields metric of a new match is greater than that of the current.
Using this heuristic solves a lot of the current reports so it seems like a win and we should move forward; let's just tread carefully.
👍
It doesn't need implementing right away, but it'd also be good for us to have an idea of what we could offer as a Python API for users to customise this heuristic. Maybe that's a can of worms, but it'd also finally give us a way to deal with a load of the user cases here.
I like this idea. At least for now, custom validators provide some sort of shortcut for this.
For my own notes, wanted to keep track of test cases covering the issues this fixes:
I'll note, https://github.com/pydantic/pydantic/issues/9101 seems like a different issue.
Now using ValidationState
to keep track of the fields_set_count
, as we don't really want to use schema information here, but rather validation time info.
I still need to add support for this to dataclasses
, and then add some tests for TypedDict
and dataclass
As discussed with @dmontagu offline, I made some changes so that fields_set_count
bubbles up for each union, that way we count fields set on a subclass as contributing to the success of a match. For example, this case now returns a ModelB
:
from pydantic import BaseModel, TypeAdapter
class SubModelA(BaseModel):
x: int = 1
class SubModelB(BaseModel):
y: int = 2
class ModelA(BaseModel):
sub: SubModelA
class ModelB(BaseModel):
sub: SubModelB
ta = TypeAdapter(ModelA | ModelB)
print(repr(ta.validate_python({'sub': {'y': 3}})))
#> Before: ModelA(sub=SubModelA(x=1))
#> After: ModelB(sub=SubModelB(y=3))
I've also added a bunch of new test cases that verify we're getting the desired behavior associated with this change.
I'll note, I'll accompany this with a docs change on the pydantic
end to update our explanation of smart
mode and when to use left_to_right
.
This PR attempts to augment the current
union
validation logic by adding in tie-breaker logic for union members that tie in terms ofexactness
.The current naive approach is such that when validating a model type data (including dataclasses and typed dicts), we count the number of fields attached to said validator. Matches with more fields beat out those with fewer to help us deal with cases of subclasses, etc.
Fix https://github.com/pydantic/pydantic/issues/9094 Fix https://github.com/pydantic/pydantic/issues/8690 Fix https://github.com/pydantic/pydantic/issues/7223 Fix https://github.com/pydantic/pydantic/issues/9224 Fix https://github.com/pydantic/pydantic/issues/7085 Fix https://github.com/pydantic/pydantic/issues/6637 Fix https://github.com/pydantic/pydantic/issues/9664
Almost fixes the problems in https://github.com/pydantic/pydantic/issues/9095, but not entirely.