new-class validator discussion

samuelcolvin commented 1 year ago

@samuelcolvin It seems to me that the approach you are describing (especially given point 2) is along the lines of what I'm thinking of as the "schema-based" approach (i.e., where a serializer is built for the Outer model at the time it is created, which I think is the right way to go). I am in favor of this approach, for the exact reasons you described.

There are a few points I'd like to bring up in response to your last comment, though admittedly they are somewhat tangential; hopefully they don't derail this thread, and I'm happy to discuss elsewhere if more appropriate.

In the Rust code you referenced, I noticed that it appears that in strict mode you can't pass subclasses of types into the appropriate fields (since it checks for exact type equality).
- Is this right?
- If so, this feels somewhat surprising/non-pythonic to me; my intuition would be that strict mode should disallow, e.g., conversions from str to int, but should allow conversion from IntEnum to int; in my mind the same thing would go for models, though I can imagine some challenges coming from the fact that different models e.g. have different __init__ signatures, etc..
  - At the end of the day, given the benefits and how it resolves certain challenges especially with pydantic models, I'm fine with it either way (I generally don't worry about using strict mode in my own code since the mypy plugin forces it, with the exception of the subclasses point.)
  - The biggest argument I see against this approach is that, as far as I can tell, even in principle, type-checkers cannot provide guarantees about exact type equality. For example, if you have a function that is documented as returning Inner, you can't know that it isn't in fact returning a SubclassOfInner. This would mean that using strict mode would be able to introduce (what are functionally) type errors not detectable by a type checker (bad), whereas non-strict mode would result in there being "correct" code that fails a type checker (less bad).
  - But like I said, I'm not really a consumer of "strict mode", so it's not a major concern to me either way, and perhaps I shouldn't be weighing in 😅.
Because validators would be re-executed when passing a subclass into a field, we may want to re-emphasize the potential pitfalls of using non-idempotent validators (maybe this is already sufficiently emphasized in the docs, just pointing it out).
Since under this logic data gets "coerced" (through re-validation) to the annotated class during validation, it seems to me that there may be a bigger risk of issues being introduced by (unvalidated) assignment to an attribute. I haven't thought super hard about the implications; maybe it's not important, and either way I think it's something we can investigate and/or resolve once the appropriate functionality is in place.

At any rate, I'll be interested to dig into your approach to Unions once it's ready!

Originally posted by @dmontagu in https://github.com/pydantic/pydantic-core/issues/327#issuecomment-1376293733

samuelcolvin commented 1 year ago

Correct.

I think you might be right about strict mode on this context, even with something as apparently obvious as "strict mode" there are edge cases where we do do some coersion.

Options are:

leave it as is
switch to allowing subclasses with strict=True - I guess this is my preference (except for the below problem)
add another flag for this situation to allow or disallow - I'd rather wait and only do this if it becomes necessary

But like I said, I'm not really a consumer of "strict mode"

Well, no one is right now as it's not released, hence why observations like this are so helpful.

I think your argument about type checking is a valid one.

However there's one difficult edge case - strict mode, as well as being configurable by users, is also used on the first pass for Unions, I suspect that's why I have it the way I do right now.

Example, let's say you have something like this:

class ParentType(BaseModel):
    x: int

class ChildType(ParentType):
    y: int

ParentOrChild = ParentType | ChildType

The union validator would do roughly the following:

is the input valid as ParentType in strict mode? if so return it
is the input valid as ChildType in strict mode? if so return it
is the input valid as ParentType in non-strict mode? if so return it
is the input valid as ChildType in non-strict mode? if so return it

In a scenario like this this, I can imagine ChildType(x=1, y=2) being strictly valid as a ParentType might loose information.

But I'm not sure about this, it's some time since I wrote this part, might need some trial and error to check.

dmontagu commented 1 year ago

Thanks for creating this issue.

I think if (raw) data is valid for both ParentType and ChildType, it's fine to choose how to parse ParentType | ChildType just by order (i.e., with this ordering, never parse raw data as ChildType lol); the main issue I have is if you were to have ParentType, ChildType, and SubChildType, and from a python init call you pass an instance of SubChildType to a field of type ParentType | ChildType, ideally it would be treated as being of type ChildType even if it could be parsed as ParentType. ... I think...

In general, I think there's a strong argument to be made for strict mode allowing subclasses, if we can work through the other implications.

(I'm still in favor of doing validation/serialization as a function of the schema of the model being parsed/serialized though, rather than supporting inheritance-based overrides for fields; as far as I can tell we are in agreement about this.)

samuelcolvin commented 1 year ago

I think we're in agreement on both, just need to work out if it's going to cause ugly edge cases with union validation.

sydney-runkle commented 2 months ago

Marking this as closed for now - we've resolved most of these questions by designing the logic for smart mode union validation, as outlined here: https://docs.pydantic.dev/latest/concepts/unions/#smart-mode

pydantic / pydantic-core

new-class validator discussion #354