pydantic / pydantic-core

Core validation logic for pydantic written in rust
MIT License
1.43k stars 242 forks source link

new-class validator discussion #354

Closed samuelcolvin closed 2 months ago

samuelcolvin commented 1 year ago

@samuelcolvin It seems to me that the approach you are describing (especially given point 2) is along the lines of what I'm thinking of as the "schema-based" approach (i.e., where a serializer is built for the Outer model at the time it is created, which I think is the right way to go). I am in favor of this approach, for the exact reasons you described.


There are a few points I'd like to bring up in response to your last comment, though admittedly they are somewhat tangential; hopefully they don't derail this thread, and I'm happy to discuss elsewhere if more appropriate.


At any rate, I'll be interested to dig into your approach to Unions once it's ready!

Originally posted by @dmontagu in https://github.com/pydantic/pydantic-core/issues/327#issuecomment-1376293733

samuelcolvin commented 1 year ago

Correct.

I think you might be right about strict mode on this context, even with something as apparently obvious as "strict mode" there are edge cases where we do do some coersion.

Options are:

But like I said, I'm not really a consumer of "strict mode"

Well, no one is right now as it's not released, hence why observations like this are so helpful.

I think your argument about type checking is a valid one.

However there's one difficult edge case - strict mode, as well as being configurable by users, is also used on the first pass for Unions, I suspect that's why I have it the way I do right now.

Example, let's say you have something like this:

class ParentType(BaseModel):
    x: int

class ChildType(ParentType):
    y: int

ParentOrChild = ParentType | ChildType

The union validator would do roughly the following:

In a scenario like this this, I can imagine ChildType(x=1, y=2) being strictly valid as a ParentType might loose information.

But I'm not sure about this, it's some time since I wrote this part, might need some trial and error to check.

dmontagu commented 1 year ago

Thanks for creating this issue.

I think if (raw) data is valid for both ParentType and ChildType, it's fine to choose how to parse ParentType | ChildType just by order (i.e., with this ordering, never parse raw data as ChildType lol); the main issue I have is if you were to have ParentType, ChildType, and SubChildType, and from a python init call you pass an instance of SubChildType to a field of type ParentType | ChildType, ideally it would be treated as being of type ChildType even if it could be parsed as ParentType. ... I think...

In general, I think there's a strong argument to be made for strict mode allowing subclasses, if we can work through the other implications.

(I'm still in favor of doing validation/serialization as a function of the schema of the model being parsed/serialized though, rather than supporting inheritance-based overrides for fields; as far as I can tell we are in agreement about this.)

samuelcolvin commented 1 year ago

I think we're in agreement on both, just need to work out if it's going to cause ugly edge cases with union validation.

sydney-runkle commented 2 months ago

Marking this as closed for now - we've resolved most of these questions by designing the logic for smart mode union validation, as outlined here: https://docs.pydantic.dev/latest/concepts/unions/#smart-mode