Open jcrist opened 1 year ago
Yes please! I would like to cache some (expensive) computed values in my objects, but not serialize them.
Thanks for the motivating use case (and apologies for the delayed response)! I hope to tackle this issue in the next few weeks before issuing a new release.
Since you commented here - as a user, would you want omitted attributes to also be excluded from __init__
? Giving a concrete example:
class Point(Struct):
x: int
y: int
_private_1: Omit[int]
_private_2: Omit[int] = 0
Should __init__
include _private_1
and _private_2
arguments? Or should they be excluded from __init__
, and only default values filled in? If they're excluded, that would mean that _private_1
would raise an AttributeError
until it's explicitly set, and _private_2
would have an initial value of 0
after __init__
.
Honestly I haven't been using the constructors much (decoding from a web request, encoding to a disk cache, decoding from the disk cache) so I don't have a strong opinion. I guess given that the object is valid without those values, the constructor shouldn't require them, but it would be nice if you could include them.
Also I would lean towards excluding attributes with leading underscores by default (without the Omit type), but I could see it being more annoying to have to include them if some API you're using has names like that.
Ok, we finally have the internals setup to make this fairly straightforward to implement.
First, I plan to call these "private" attributes, since that's effectively what they are. A private attribute won't be encoded or decoded, but will have space allocated on its respective struct.
There are some open questions though.
There are two places that field configuration like this can go:
In the type annotation
class Example(Struct):
x: int
y: Private[int]
z: Private[float] = 0.0
An option in msgspec.field
class Example(Struct):
x: int
y: int = msgspec.field(private=True)
z: int = msgspec.field(default=0.0, private=True)
The former I find a bit easier to read, but it adds some complications:
ClassVar
. This is doable, but to remain efficient can't handle all cases in the case of postponed annotations.Private
type annotation by its nature must be applied to a field - a type list[Private[int]]
is invalid, but python would happily let the user spell such a type. Using a default value with field
ties this annotation to struct fields alone, avoiding this issue.dataclass_transform
, treating Private
fields differently from public ones.Right now I'm leaning towards the msgspec.field
approach. Many users will already be familiar with dataclasses.field
, which includes support for omitting fields from the generated __init__
via dataclasses.field(init=False)
. This is similar enough to private=True
that it shouldn't be that wild of a concept.
Since we're providing a way to set whether a field is public/private explicitly, users can always override any default behavior chosen here. Since fields with leading underscores are usually treated as private in python by convention, I'm tempted to do so here as well. So the following would be equivalent:
class Example1(Struct):
_field: int = 0 # private, since the leading underscore
class Example2(Struct):
_field: int = field(default=0, private=True) # explicitly private
On the other hand, I'd expect most users to never need to define private fields, so maybe inferring private/public status from the field name itself is overreaching.
__init__
?Right now I'm planning on excluding private
attributes from basically everything in msgspec
. They'll have space allocated in the __slots__
of the generated Struct
type (and corresponding attributes defined), but will be excluded from:
__init__
__copy__
__reduce__
(pickling)__repr__
__eq__
and other comparison methods__hash__
__match_args__
encode
decode
This makes them most useful for including extra state during runtime that is initialized after the Struct instance is already created. cached_property
usage is related here (#204), since you'd need a slot
allocated to store the cached value.
If you have a use case for including private
attributes in __init__
, please let comment below. If needed, worst case we can add another init
parameter for including/excluding a field from __init__
. private=True
will imply init=False
by default, but you can manually re-include it as field(private=True, init=True)
.
Right now all msgspec.Struct
types require all fields be initialized as part of the generated __init__
. The only way to get an AttributeError
for a field on a struct is if you manually del
the attribute off an existing struct
. I think this makes sense. mypy
/pyright
type annotations are designed to handle types (or unions of types), but not the presence/absence of an attribute.
Keeping with this design (and assuming private fields are excluded from __init__
), it may make sense to enforce that all private fields have a default value (since there's no other way a value could be set).
class Example(Struct):
private1: int | None = field(private=True, default=None) # this is ok, there's a default value
private2: int | None = field(private=True) # this is invalid, there's no default value
In summary, my current leanings on these questions are:
msgspec.field
for configuring if a field is private.__init__
. We can add a way to override this later if needed.I don't plan on adding this before the next release (hopefully out next week), but after that whatever choices made here should (hopefully) be fairly straightforward to implement :crossed_fingers: .
This would be of great use to my current project --- like IcyMidnight, we've got a cache that would be really pointless to encode. The current plan is to clear the cache before encoding, which will work as long as we remember to do it.
- Use msgspec.field for configuring if a field is private.
- Make fields with a leading underscore private by default.
What happens if you do _something = field(private=False)
? I think this should be allowed: a private field (in the Python convention sense) that's not private (in the msgspec
sense).
Additionally, automatically treating fields with a leading underscore as private would break backwards compatibility. I found a few examples of Structs with _private
fields which makes me think that making private fields an explicit, opt-in choice is a better idea.
Really looking forward to this feature, I hope you have time to work on it soon.
Adding a private=True
parameter to field()
seems fine to me.
Private omitted fields are the only way to have self-referential Structs.
e.g. a nested and occasionally recursive data structure with access to the _root
node.
We can also use it to compute and store the path in that data structure because Structs aren't aware of their parent nodes.
Hi. Any update on this? I'm confused whether the library has this attribute.
Right now I'm leaning towards the msgspec.field approach. Many users will already be familiar with dataclasses.field, which includes support for omitting fields from the generated init via dataclasses.field(init=False). This is similar enough to private=True that it shouldn't be that wild of a concept.
If this is still up for debate, I concur that this is the right approach.
If you have a use case for including private attributes in init, please let comment below. If needed, worst case we can add another init parameter for including/excluding a field from init. private=True will imply init=False by default, but you can manually re-include it as field(private=True, init=True).
class Primitive(Struct):
iid: int = field(private=True) # internal ID, think auto-incrementing sequence meant to be private
eid: str = field(private=False) # external ID, some sort of public-exposed string identifier
In this use case, the object when fetched from the DB would have both IDs so the application layer has them available. But, when serialized as an HTTP response, we strip out the private iid
.
Use msgspec.field for configuring if a field is private.
Agreed.
Make fields with a leading underscore private by default.
This is a nice API :sunglasses:
Exclude private fields from init. We can add a way to override this later if needed.
See my comment above. I am not sold on this one. The two paramater approach makes more sense to me as they are different use cases. I am fine if private=True
implies init=False
. But, I want to have the flexibility for private=True, init=True
.
Require all private fields set a default value.
I guess I am okay with this, as long as I can still require it as an __init__
. But once that is allowed, it seems to me this requirement falls off.
Would private fields also be excluded from __struct_fields__
?
I came here looking for this functionality so I can store some extra metadata on a Struct that I don't want to be serialized or part of the __init__
.
I'm envisaging an api where I can have an alternative constructer which can take optional (metadata) kwargs and store them on the struct without having them be included in the serialized representation - e.g.
class MyStruct(msgspec.Struct, tag=True):
metadata: dict = field(default={}, private=True)
@classmethod
def create(cls, **kwargs) -> MyStruct:
instance = MyStruct()
instance.metadata.update(**kwargs)
return instance
>>> obj = MyStruct.create(key='value')
>>> obj
MyStruct(metadata={'key': 'value'})
>>> msgspec.json.encode(obj)
b'{"type":"MyStruct"}'
>>> msgspec.json.decode(msgspec.json.encode(obj), type=MyStruct)
MyStruct(metadata={})
My actual use-case is a bit more complex but it boils down to wanting metadata associated with my objects but only for their lifetime in the current interpreter.
Another example, of something I'd like to do is to attach an fsspec.AbstractFileSystem
to the class which would then be used in a save
method to serialize itself and write to disk - e.g.
class MyStruct(msgspec.Struct, tag=True):
metadata: dict = field(default={})
fs: fsspec.AbstractFileSystem = field(private=True)
@classmethod
def create(cls, fs: fsspec.AbstractFileSystem, **kwargs) -> MyStruct:
instance = MyStruct()
instance.metadata.update(**kwargs)
instance.fs = fs
return instance
def save(self, filepath: str) -> None:
self.fs.write_bytes(
path=filepath,
value=msgspec.msgpack.encode(self),
)
msgspec
doesn't know how to serialize an AbstractFileSystem
, and I wouldn't want it to be serialized along with my actual data, however it would be convenient to be able to store it on the instances.
Any updates on this?
Since
Struct
types force users to declare allStruct
attributes, there's currently no way to hold extra state on aStruct
without having it also be encoded/decoded.Two ways to handle this:
omit=True
toMeta
annotations. This could also be packaged up as a parametrized type, so users could writejson
doesn't encode/decode private struct attributes by default.