jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.38k stars 72 forks source link

Support for attrs `field(validators=...)` #535

Closed provinzkraut closed 1 year ago

provinzkraut commented 1 year ago

Description

It would be great if msgspec supported field(validators=...) for attrs classes.

This is adjacent to #494.

An issue I'm currently facing is that, since msgspec supports attrs classes natively, there is no way to get those validators to work; Msgspec doesn't call them and I also can't make sure they're called via a dec_hook.

A way to get this to work would be quite nice (=

jcrist commented 1 year ago

Yes, attrs validators aren't currently called. I think supporting them might make sense, but I'm a bit torn. Many of the builtin attrs validators are already supported in msgspec via other mechanisms, and will be both slower and duplicate work. For example:

In [2]: from attrs import define, field

In [3]: from attrs.validators import lt, gt, instance_of

In [4]: @define
   ...: class Ex:
   ...:     x: int = field(validator=[instance_of(int), lt(100), gt(0)])

would be better written in msgspec as

In [5]: import msgspec

In [6]: from typing import Annotated

In [7]: @define
   ...: class Ex2:
   ...:     x: Annotated[int, msgspec.Meta(lt=100, gt=0)]
   ...: 

In [8]: msgspec.json.decode(b'{"x": 1}', type=Ex2)
Out[8]: Ex2(x=1)

In [9]: msgspec.json.decode(b'{"x": -1}', type=Ex2)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[9], line 1
----> 1 msgspec.json.decode(b'{"x": -1}', type=Ex2)

ValidationError: Expected `int` >= 1 - at `$.x`

The instance_of validator is unnecessary, since msgspec will always check that it's an int. And the bounds-checks are better represented using a Meta annotation, where msgspec can validate them much faster than attrs can.

All that said - we can support attrs validators (no promises I'll get to it quickly). I just worry that supporting them will lead to inefficient usage that could be avoided by users making better use of msgspec native features.


A way to get this to work would be quite nice (=

If you absolutely need to get this to work now, you can do this by calling attrs.validate on self in a __attrs_post_init__ call. This is bad since it will run the validators twice when constructed outside of msgspec (once during normal field init, then again during post-init), but since msgspec does call __attrs_post_init__ this will force msgspec to call them.

In [34]: @define
    ...: class Ex:
    ...:     x: int = field(validator=[instance_of(int), lt(100), gt(0)])
    ...:     def __attrs_post_init__(self):
    ...:         attrs.validate(self)
    ...: 

In [35]: x = Ex(1)  # validators are called *twice* here

In [36]: msgspec.json.decode(b'{"x": 1}', type=Ex)  # validators are called *once* here
Out[36]: Ex(x=1)

In [37]: msgspec.json.decode(b'{"x": -1}', type=Ex)  # validators are called *once* here
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[34], line 5, in Ex.__attrs_post_init__(self)
      4 def __attrs_post_init__(self):
----> 5     attrs.validate(self)

File ~/miniconda3/envs/msgspec/lib/python3.9/site-packages/attr/_make.py:1964, in validate(inst)
   1963 if v is not None:
-> 1964     v(inst, a, getattr(inst, a.name))

File ~/miniconda3/envs/msgspec/lib/python3.9/site-packages/attr/_make.py:2905, in _AndValidator.__call__(self, inst, attr, value)
   2904 for v in self._validators:
-> 2905     v(inst, attr, value)

File ~/miniconda3/envs/msgspec/lib/python3.9/site-packages/attr/validators.py:474, in _NumberValidator.__call__(self, inst, attr, value)
    473 if not self.compare_func(value, self.bound):
--> 474     raise ValueError(
    475         "'{name}' must be {op} {bound}: {value}".format(
    476             name=attr.name,
    477             op=self.compare_op,
    478             bound=self.bound,
    479             value=value,
    480         )
    481     )

ValueError: 'x' must be > 0: -1

The above exception was the direct cause of the following exception:

ValidationError                           Traceback (most recent call last)
Cell In[37], line 1
----> 1 msgspec.json.decode(b'{"x": -1}', type=Ex)

ValidationError: 'x' must be > 0: -1
provinzkraut commented 1 year ago

Many of the builtin attrs validators are already supported in msgspec via other mechanisms, and will be both slower and duplicate work

No argument there. This definitely makes sense that if you're already using msgspec. However, my use case is - as usual - Litestar, where we swapped to msgspec for our internal parsing and validation needs. This includes handling of attrs classes a user has passed in.

The concrete issue there is that we currently have no way to validate those correctly, whereas prior to msgspec adding support for attrs we could via an enc_hook. So essentially the issue here for us is that msgspec only has partial support for decoding/converting into attrs, but no way to override its attrs handling.

I'm not sure how far along you are in the decision making process of finding a "better way" to handle the hooks (i.e. the one that would allow to override natively supported types), but I feel like that would quite often be a solution to the issues that arise due to different use cases.

FHU-yezi commented 1 year ago

We do need a way to add custom validator to Struct class.

I'm building a 3rd-party SDK for a platform, in my case, we need to validate if the provided url is valid or not, by sending an API call to an endpoint.

class User(msgspec.Struct):
    url: str  # How can I do custom validate?

In Pydantic, I can do this:

class User(Model):
    url: str

    @field_validator("url")
    @classmethod
    def validate_url(cls, value: str) -> str:
        if not is_user_url_valid(value):
            raise ValueError("User URL invalid")
        return v
jcrist commented 1 year ago

We do need a way to add custom validator to Struct class.

This is unrelated to this issue. #483 is the issue tracking that feature.

in my case, we need to validate if the provided url is valid or not, by sending an API call to an endpoint.

Are you saying you make an API call during validation? If so, I wouldn't handle that there (even when using pydantic) - making an API call everytime a User instance is created seems wasteful, and having side effects like an API call hidden inside a constructor may make it easy to miss that the API call is happening. It also won't compose well if working with async pythong.

Instead, I'd handle this separately outside of the JSON decoding, so it's explicit when the API request is being made. Something like:

user = msgspec.json.decode(msg, type=User)
user.validate_url()  # explicit when the URL validating request happens
await user.validate_url()  # could even make it async if needed