jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.45k stars 76 forks source link

Add support for specifying constraints on unions #447

Closed provinzkraut closed 1 year ago

provinzkraut commented 1 year ago

Description

Trying to set a constraint like min_length or max_length for a field that's a union of types supporting this constraint results in a TypeError:

from typing import Annotated

import msgspec

class Foo(msgspec.Struct):
    bar: Annotated[str | bytes, msgspec.Meta(min_length=2)]

msgspec.json.decode('{"bar": "abc"}', type=Foo)
# TypeError: Can only set `min_length` on a str, bytes, or collection type - type `typing.Annotated[str | bytes, msgspec.Meta(min_length=2)]` is invalid

The validation logic here should probably check if the type supports the constraint, and if it is a union, that all types of the union support the constraint.

There is also the case where a type is optional (so a union with None), which is currently not supported:

from typing import Annotated

import msgspec

class Foo(msgspec.Struct):
    bar: Annotated[str | None, msgspec.Meta(min_length=2)]

msgspec.json.decode('{"bar": "abc"}', type=Foo)
# TypeError: Can only set `min_length` on a str, bytes, or collection type - type `typing.Annotated[str | None, msgspec.Meta(min_length=2)]` is invalid

Some special casing here would be great, as this is a fairly common use case.

jcrist commented 1 year ago

The validation logic here should probably check if the type supports the constraint, and if it is a union, that all types of the union support the constraint.

I can see how this would be nice to have and makes sense, but refactoring the internals to support this may take some time. The C datastructures used for parsing type annotations would have to be changed to support a model where the annotations can apply to more than one type.

Is this a blocker or just a nice-to-have? I'll get to it eventually; unless it's a blocker I'd prefer to prioritize some other work right now.


There is also the case where a type is optional (so a union with None), which is currently not supported:

I'm less convinced about this case, since min_length can't apply to None. I'd write this as:

Annotated[str, Meta(min_length=2)] | None

I can see the desire to special case None, but personally I'd value consistency in the type parsing logic. A rule like "Any constraint applied in an Annotated must be valid for all types inside the Annotated".

Is there anything particular about this case where not supporting it makes things harder for litestar's use case?

provinzkraut commented 1 year ago

Is this a blocker or just a nice-to-have? I'll get to it eventually; unless it's a blocker I'd prefer to prioritize some other work right now.

For our use case it could be a blocker depending on the path we follow with the msgspec integration. We are currently re-evaluating if it's possible to swap out all our internal modelling/validation logic for a pure msgspec approach. If this is our goal then it would be a blocker, since this functionality is currently supported in both attrs and Pydantic, so it would be a breaking change.

However, I wouldn't say that it's especially time sensitive. If you already have this on your map, we can easily work around this for now. Do you have a rough estimate when you might get to this?


I'm less convinced about this case, since min_length can't apply to None. I'd write this as:

Annotated[str, Meta(min_length=2)] | None

Is there anything particular about this case where not supporting it makes things harder for litestar's use case?

I was not aware that msgspec supported this. We can definitely work with this though, so it's not a necessity for our integration at all.

Thanks a lot @jcrist!

jcrist commented 1 year ago

If this is our goal then it would be a blocker, since this functionality is currently supported in both attrs and Pydantic, so it would be a breaking change.

I'm not sure I follow, since this is using msgspec.Meta annotations, how would this be "currently supported in both attrs and Pydantic"?

The constraints you're asking for above do work, you just need to apply them directly to the type, not to the union:

Annotated[str, Meta(max_length=2)] | Annotated[bytes, Meta(max_length=2)]

Note that in this case you'll run into our union restriction, but the point in general is that constraints can be applied to types in unions, just not currently to the union itself.

Do you have a rough estimate when you might get to this?

Estimates are tricky. If it's important for y'all I'd guess sometime in the next 3-4 weeks, longer if it's less important.

provinzkraut commented 1 year ago

I'm not sure I follow, since this is using msgspec.Meta annotations, how would this be "currently supported in both attrs and Pydantic"?

The case here is that there might be a user-supplied type, such as str | list[str], to which we apply a constraint. Seeing your answer now though, it seems like this too could be worked around by unwrapping the unions and then applying the constraints to the unwrapped types.

I'll check if this is a viable path for us and then get back to you. My gut feeling though is that it should be doable, so support for this pattern wouldn't be a high priority for us.

provinzkraut commented 1 year ago

I'll check if this is a viable path for us and then get back to you. My gut feeling though is that it should be doable, so support for this pattern wouldn't be a high priority for us.

It was indeed easy to support this, so no need for this to be anything more than a "nice to have".

jcrist commented 1 year ago

Thinking more on this, I'm now skeptical of this proposed change.

Currently the constraints only apply to the immediately wrapped type. If we support applying constraints to a union type (applying the constraints to all enclosed types), it's not clear what that behavior means for how they apply to other nested types like list and whether that behavior is consistent and intuitive.

For example:

from msgspec import Meta
from typing import Annotated

# A str type with max length 3
Annotated[str, Meta(max_length=3)]

# A str type with max length 3, or a bytes type with max length 3
Annotated[str, Meta(max_length=3)] | Annotated[bytes, Meta(max_length=3)]

# With this feature we _could_ spell the above by wrapping the union with Annotated.
# The user might then view constraints as applying to any nested type wrapped in the
# annotated, not just the immediately wrapped type.
Annotated[str | bytes, Meta(max_length=3)]

# This model may be confusing though if you consider nested types like `list`.
# Currently this means a list of at most 3 items, where the items are all strings.
# With this proposed change, would it mean that? Or would it mean a list of 
# at most 3 elements, where each element is a string with max length 3?
Annotated[list[str], Meta(max_length=3)]

Right now I'm leaning towards not supporting this, but I'd be curious to hear your thoughts. Perhaps I'm overthinking this.

jcrist commented 1 year ago

Closing for the reasons provided above. If this becomes a sticking point for a few other users then I may reconsider, but for now I think the current behavior is easier to understand (and also implement).