Closed HansBrende closed 5 months ago
Thanks for opening this! We were originally being a bit stricter than necessary here. The real limitation is types with gc=False
must be __slots__
classes, so any mixin type (like Generic
) must also define __slots__ = ()
. With #635 you should be able to set gc=False
on generic structs as well.
In [1]: from typing import Generic, TypeVar
In [2]: from msgspec import Struct
In [3]: P = TypeVar("P")
In [4]: class Demo(Struct, Generic[P], gc=False):
...: x: P
...: y: P
...:
In [5]: d = Demo(1, 1)
In [6]: import gc
In [7]: gc.is_tracked(d)
Out[7]: False
Standard note - messing with the gc
kwarg is considered "advanced usage", I trust you've read all the warnings in the docs before using it :).
@jcrist thanks for the fix!
I have read the documentation on that, however, I'm confused on one point:
Why would any struct that participates in deserialization not be a good candidate for gc=False
? As we know, when deserializing JSON to a normal dict, it is impossible that that JSON is self-referencing. I.e., you can't have a thing inside itself simply because that is impossible to represent as JSON! So any reference cycles for any of these objects participating in deserialization would by nature have to be created manually in the __post_init__ or subsequent stages. So as long as I am not "adding a thing to itself" post-init, and these structs originate from JSON, I should be totally safe for gc=False
.
Or am I missing something?
No, that's accurate. Custom types supported by dec_hook
could result in cyclic behavior, but in general it's unlikely for the result of a decode
call to have any cycles. But code constructing these objects outside of decode
could still result in a cycle. The warnings are mostly to let users know "here be dragons" and to deter them from mucking with the gc
unless a benchmark shows it matters. That you can properly reason about cyclic object structures and python's GC implementation means you are probably capable of judging whether disabling it on these has consequences for your code :).
@jcrist awesome! One thing I did notice during my benchmarks is that gc=False
is somewhat undermined by the presence of UUID
fields... for some reason python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives. I tried to find something on how to "untrack" UUIDs... but was unsuccessful... so I ended up just disabling garbage collection altogether until all my objects are destroyed anyways by refcounting.
python thinks it should track UUIDs even though they are immutable and only contain a couple underlying primitives
In CPython any type implemented in pure python is a GC type. Since uuid.UUID
objects aren't extension types (i.e. they're implemented in python) then they're automatically GC types. If uuid.UUID
types were implemented as extension types then you're correct, they wouldn't need to be a GC type.
One option if you'd rather disable GC on the type instead of globally - If you don't ever manipulate the UUIDs as uuids you might try annotating those fields as str
instead (possibly with a pattern regex for matching uuids if you're concerned about invalid uuids getting in). Strings are immutable non-GC types. That said, for large payloads the overhead of turning on/off the gc per decode
call should be minimal.
Question
If I do
I get the following error:
I suppose I can work around this by dynamically redefining the struct for each possible
P
type, to avoid usingGeneric
, but is this expected? It would be easier ifGeneric
were excluded from the above restriction.