Open Tinche opened 1 year ago
Hi! Thanks for opening this. A few thoughts inline:
From what I understand, you don't support generics and PyPy (maybe other constraints I'm not aware of, not sure if you support dict classes?), which would limit my personal use of the library.
Generic structs (#193) are currently unsupported, but are at the top of my list to implement. This is a solvable problem, and is something I'd expect to be resolved in the next 3 months (hopefully sooner).
By "dict classes" I assume you mean Structs with a __dict__
defined? Those are unplanned and unlikely to happen without a strong use case. Do you have a use case for those over relying on __slots__
?
PyPy support is also not planned. It's doable, but it's a lot of work and isn't something I think is worth it.
Also I'm attached to the way cattrs does customization of de/serialization.
I need to write up an issue for this, but before 1.0 I plan to redo our extension mechanism. A single hook for encoding and decoding (enc_hook
/dec_hook
) doesn't provide enough granularity, and is limiting support for fancier types. This won't exactly match how cattrs does things, but will be closer than what we have now. Would you be open to giving some feedback on the design there once I write things up?
if your class isn't generic, and you don't use PyPy, you'd choose to use msgspec, otherwise you'd use the default attrs backend and have an attrs class. The resulting class would contain
__attrs_attrs__
, and so would essentially be an attrs class.
IIUC you're saying that you'd use msgspec
to accelerate cattrs
(or attrs?) for cases where msgspec
would provide drop-in functionality? I think this may be doable, but I'm not sure what the benefit would be here? If you're not using msgspec
to encode the entire message (which you wouldn't if you want to use cattrs
throughout) then the benefit of msgspec
would be less noticeable. Faster __init__
/__eq__
for ordered types, but that may be it. That said, depending on what's needed here we may be able to make something work. I'd be against adding __attrs_attrs__
to Struct
types internally, but if we can provide the proper hooks that you can plumb things together then sure!
Another approach would be adding native support to msgspec
for encoding/decoding attrs types, similar to what we have for dataclasses
. I was able to get serialization working in about 10 minutes of hacking tonight (deserialization is always harder), but this would also be doable.
In [1]: import msgspec
In [2]: import attrs
In [3]: @attrs.define
...: class Example:
...: x: int
...: y: list[str]
...: z: dict[str, int]
...:
In [4]: msg = Example(1, ["two", "three"], {"four": 5})
In [5]: msgspec.json.encode(msg)
Out[5]: b'{"x":1,"y":["two","three"],"z":{"four":5}}'
If we went this route I'd definitely want advice from you on how this should be handled - you are after all the expert on serializing attrs things :).
Anyway, I guess in summary:
msgspec
and attrs
could integrate in some waymsgspec.Struct
types look like attrs types within msgspec, but if we can make this possible for you to do then I'd be happy helpattrs
types, similar to how we support dataclasses
nowMy initial thoughts were not focused on serialization, although that's a part of it. Even if serialization wasn't part of the story, I also wouldn't discount the benefits of faster __init__
/__eq__
so quickly ;) (Also I assume the __repr__
can also be accelerated? For example, we repr()
for our logs on every request.)
Let's talk vision first.
Let me describe my personal work situation to give you a better idea of where I'm coming from. Our stack is heavily based on attrs:
Due to the nature of attrs, all of these work together (a class can be shared between the databases and all the interfaces).
Just to be clear, this isn't about solving my problems at work. (If it was about that, I'd just write a msgspec-attrs adapter internally and use it. Also I would absolutely love open sourcing all of this stuff, so that's one of my long-term goals.) This is about joining efforts for the benefit of both open-source ecosystems.
For some of these use cases, I'm doing work in the broader open source community to take it a step further. For example, my next step for the Mypy plugin will be to implement support for attrs.fields(cl)
to allow statically-checked database projections (don't want to segue too much here). Once certain other features land in Mypy, I'll pick up work on uapi (think FastAPI but cleaner and for attrs). Other people are building their own stuff on attrs.
So what I'd like is this: we create a situation where I can drop msgspec into this stack and it just works, and is faster where that's possible. It's definitely not just about cattrs. It's a shame if we make users pick between super fast JSON serialization and using the model in a database too.
If we figure out a way to integrate:
To answer some of your comments:
By "dict classes" I assume you mean Structs with a dict defined?
Yeah, there are some use cases, for example @cached_property
and Exceptions (it's loosely-assumed arbitrary attributes can be attached to Exceptions in certain context, I remember this coming up when I was adding ExceptionGroups to cattrs and the __note__
PEP was being discussed).
Would you be open to giving some feedback on the design there once I write things up?
Sure, of course.
Another approach would be adding native support to msgspec for encoding/decoding attrs types
This would be sweet. It'd let me integrate msgspec into cattrs fairly easily, for example.
To finish up this overly long post, some thoughts on integration. This section is brainstormy, just me spitballing ideas.
You wouldn't need to change how msgspec works by default.
It would be awesome if it was possible to just do:
from msgspec import define, Factory
@define
class Test:
a: int = 5
b: list[int] = Factory(list)
Because it would just be a drop-in replacement for vanilla attrs. This might not be possible because msgspec.field
is already something different than attrs.field
so it might confuse typecheckers, I don't know, maybe this stuff could live in msgspec.attrs
instead. This interface doesn't have to support all of the attrs features, although I think you have most of them anyway. (The Mypy plugin would have to be changed to take this into effect.) Also you wouldn't need to depend on attrs, these could be conditionally created if attrs is already present.
These classes, being both attrs and msgspec classes, would work with both cattrs and msgspec serialization. The step after that would be me integrating msgspec into cattrs by implementing msgspec converters for json and msgpack.
This would be sweet. It'd let me integrate msgspec into cattrs fairly easily, for example.
I spent some time hacking together full attrs
support tonight. If you could take a look at/try out #323 that would be much appreciated.
I promise I'll do a longer writeup response to your comment above sometime tomorrow, it's already late here :(.
Also I assume the repr can also be accelerated? For example, we repr() for our logs on every request.)
That's true, a quick benchmark shows ~2x faster for a simple message (from external conversations it sounds like you also might like #322, which should also be faster).
Yeah, there are some use cases, for example
@cached_property
and Exceptions (it's loosely-assumed arbitrary attributes can be attached to Exceptions in certain context, I remember this coming up when I was adding ExceptionGroups to cattrs and the__note__
PEP was being discussed).
Wait, are you saying you use attrs
when defining custom exception subclasses? That's a use case I'd never thought of.
In case there's any confusion, let me be clear on what msgspec supports here:
msgspec.Struct
types are always __slots__
only. I have little intention of changing this for storing annotated attributes. There's an open issue for supporting cached_property
(#204), which won't work on classes without a __dict__
. We might solve this by adding an opt-in lazily created __dict__
(dict=True
or something). This would only be used for adding extra hidden state on an instance, and not for storing any annotated attributes.dataclasses
or attrs
as of #323) attributes may be stored in either __dict__
or __slots__
.It would be awesome if it was possible to just do: ...
I don't know, maybe this stuff could live in msgspec.attrs instead. This interface doesn't have to support all of the attrs features, although I think you have most of them anyway.
I was picturing attrs
(or cattrs
) optionally using msgspec
behind the scenes here, not msgspec
providing an attrs
-like interface. We could do that, but I'd worry that any inconsistencies in implementation would lead to subtle issues on the user's end. What requirements are there for the output of define
being an attrs object? Just matching the define
kwargs and behaviors (for things we support)? Or do you also need the generated __attrs_attrs__
attribute? And if so, how often does that interface change?
Also note that msgspec.Struct
types work by subclassing; any magic define
decorator made here would have to inject the Struct
base class into the class hierarchy to work properly.
The Mypy plugin would have to be changed to take this into effect
I've been hoping that we'll never have to write a mypy plugin (#19) with the new typing.dataclass_transform
work, but mypy
support for this is still lagging. If you're ever feeling motivated to write a new mypy plugin, I'd love some help here :).
Stepping back to look places where attrs
might interact with msgspec
:
Accelerated encoding/decoding of attrs
objects. This is handled by #323.
Accelerated generated methods for attrs
objects (__init__
/__eq__
/__copy__
/__repr__
, ...). I'm not 100% convinced this matters for real world code, but could be doable with either a msgspec.attrs
wrapper (or attrs
optionally relying on msgspec
). msgspec.Struct
types have more restrictions that attrs
types though, so many features of attrs
wouldn't be supportable out-of-the-box. I'm unlikely to spend effort loosening these restrictions without a motivating use case.
Are there other interaction points missing here?
Hi, I caught a cold so my replies here will be somewhat delayed, sorry ;)
No worries, hope you feel better soon!
I'm hoping to cut another release in the next week or two - do you think you'll have time to review #323 before then? We can always extend our attrs support, but changing existing behavior after a release is harder.
Hope you're feeling better @Tinche - any chance you could provide your thoughts on #323 sometime in the coming week? If possible I'd like to include a first pass at attrs support in the next release.
Yessir, putting this on my shortlist
@Tinche, do you think you'll have time to review #323 soon (as well as respond to my comment above)? If you don't have time to continue with this issue in the next week I'm going to close this as stale (we can always pick it up again later).
Hello,
I've kinda had to declare open source bankruptcy recently since I'm way behind on everything, hence all the delays.
So let me try answering some of your questions.
I think it's perfectly ok if the hypothetical msgspec.define
isn't 100% compatible with attrs.define
since it's an opt-in thing, users will know what they're getting into. So slot classes only perfectly fine.
We'd need the __attrs_attrs__
attribute, since tooling requires it. I'd say the attribute is half of the point here. __attrs_attrs__
is an instance of a tuple subclass (it's not a NamedTuple, since those have restrictions on their attribute naming that are too severe for this use case), with properties that return attrs.Attribute
objects. attrs.Attribute
changes infrequently but it does change sometimes. We've never had other users of it though. I'm guessing we set up a communication channel so we let you know of upcoming changes (maybe I can contribute PRs when it happens?).
Also note that msgspec.Struct types work by subclassing; any magic define decorator made here would have to inject the Struct base class into the class hierarchy to work properly.
I don't think this is a problem, attrs.define
already does crimes to add __slots__
to classes that don't have them. So you'd recreate the class with msgspec.Struct
in the MRO too. There are some tricky that need to be done with rewriting closure cells in methods so things like bare super()
works that we can help you with (dataclasses have the same problem, but we try harder than them to solve it).
I think your summary hits the nail on the head. But like I mentioned, I think speed (in __init__
, __eq__
, __repr__
is very valuable for some use cases (my use cases). If I can get significantly faster class construction by switching my classes to msgspec.attrs.define
while the rest of my tooling continues just working, I'm going to do it, and I won't be the only one probably. I think you'd just need to provide a good subset, not the entire subset.
I've kinda had to declare open source bankruptcy recently since I'm way behind on everything, hence all the delays.
No worries, I've definitely been there. Thanks for the reply!
I think I have enough info to hack something together given your answers above. No promises on when I might get around to this, but an attrs-compatible shim seems potentially useful, and at worst would be a fun thing to hack on.
Are there any conclusions out of this issue?
Description
Hello! I think this is a cool library, and I think you should consider integrating with the attrs ecosystem.
A small disclaimer first: I'm a major contributor to attrs (behind @hynek), the author of cattrs, and at this point I'm slowly getting comfortable starting to say I'm one of the mypy attrs plugin maintainers.
I haven't had the chance to use msgspec for real yet. You've done some very impressive things efficiency-wise, but I feel there's a lot of overlap between some work here and some work in attrs/cattrs. So now I'm thinking of ways to integrate.
Strategically speaking, I think msgspec is cool but limited in a number of ways. From what I understand, you don't support generics and PyPy (maybe other constraints I'm not aware of, not sure if you support dict classes?), which would limit my personal use of the library. Also I'm attached to the way cattrs does customization of de/serialization.
However, here's the thing: if we integrate the way I'm thinking, there would be no need for you to actually do these things (you could if you wanted to, or you could focus on other aspects). We could integrate msgspec classes (er, structs) into the attrs system and msgspec de/serialization into cattrs.
A simple idea would be having msgspec as an attrs backend that would be opt-in - so if your class isn't generic, and you don't use PyPy, you'd choose to use msgspec, otherwise you'd use the default attrs backend and have an attrs class. The resulting class would contain
__attrs_attrs__
, and so would essentially be an attrs class.What I mean by the attrs ecosystem: for example, Rich supports attrs classes, although I know you support Rich in msgspec. Another example is an OpenAPI generator I'm working on for attrs classes, which would just work for msgspec classes too. A third example is work I'm doing for better typing of
attrs.fields
for use in ORMs/ODMs; this would also just work.The first and most important question, however, is if you are even open to doing this? If yes, we can bounce ideas.