Make better union validation decisions based on `extra` behavior

pydantic / pydantic-core

Core validation logic for pydantic written in rust

MIT License

1.4k stars 232 forks source link

Make better union validation decisions based on `extra` behavior #1335

Open sydney-runkle opened 3 months ago

sydney-runkle commented 3 months ago

For example:

class ModelA(BaseModel):
    pass

class ModelB(BaseModel, extra='allow'):
    pass

TypeAdapter(ModelA | ModelB).validate_python({'x': 1})
#> ModelA()

Is the current behavior, but that doesn't seem like the best decision.

See https://github.com/pydantic/pydantic-core/pull/1332 and https://github.com/pydantic/pydantic-core/pull/1334 for recent context on union validation decision improvements.

davidhewitt commented 3 months ago

It seems to me like the solution here would be to have another counter extra_fields_set 🙈

sydney-runkle commented 3 months ago

Indeed. Honestly a good first issue for someone looking to start in rust and pydantic-core, given that we just set the model for this in the PRs linked above.

sydney-runkle commented 3 months ago

@erohmensing you might be interested in this!

erohmensing commented 3 months ago

Thanks for the tag @sydney-runkle! This makes sense, would be happy to take a look.

@davidhewitt is there any reason why you'd think of a counter over a bool for extra_fields_set? From my understanding the models either allow extras or don't, but maybe there's some more complicated custom config logic where you can customize the shape of allowed extras that I'm not aware of?

sydney-runkle commented 3 months ago

@erohmensing,

I think a bool would be fine for the reasons above that you mentioned. In some ways, a bool is better, because I don't think an arbitrary count of extra fields set is indicative of a better match if two models have the same extra config setting.

Also, consider:

from pydantic import BaseModel, TypeAdapter

class ModelA(BaseModel):
    a: int
    b: int

class ModelB(ModelA, extra='allow'):
    pass

print(repr(TypeAdapter(ModelA | ModelB).validate_python({'a': 1, 'b': 1, 'x': 1})))
#> ModelA(a=1, b=1)

This is the current behavior - I think the desired behavior would be ModelB. This is a case where we'd have fields_set_count information (a tie), and could default to the extra metric as a tiebreaker. That being said, maybe exactness should take priority over our new metric?

erohmensing commented 3 months ago

@sydney-runkle

This is the current behavior - I think the desired behavior would be ModelB

I agree!

That being said, maybe exactness should take priority over our new metric?

do you mean in the case of like

from pydantic import BaseModel, TypeAdapter

class ModelA(BaseModel):
    a: int
    b: int

class ModelB(ModelA, extra='allow'):
    pass

print(repr(TypeAdapter(ModelA | ModelB).validate_python({'a': 1, 'b': 1)))

should be Model A? If so, agreed - that seems to be the case and I think it would make sense for it to stay like that

sydney-runkle commented 1 month ago

@erohmensing, are you still interested in contributing here?