Closed beanaroo closed 2 years ago
The optimization for dataclasses without __slots__
is to serialize __dict__
. This use case could be solved by skipping serialization of attributes where the key has a leading underscore. How does that impact your library?
Interesting. Thank you for the insight.
The above is merely a sample. In my case, I'm actually storing non-protected InitVar fields for usage in external methods too. I'll see if __slots__
are an option as the model is quite static.
Dataclasses have __dataclass_fields__
. Could this perhaps improve the portability of orjson in general?
I think with the distinction that dataclass are objects rather than dict
it makes sense to not serialize attributes that being with a leading underscore--those are "private" in Python. But do you need more than that for "storing non-protected InitVar fields"--are those not with an underscore and you can't modify it?
It does currently read __dataclass_fields__
to duck type a dataclass. But iterating through that requires extra libpython function calls per attribute and so is much slower than using __dict__
.
Ah, that makes sense. The performance hit is unfortunate.
From the comparisons in the documentation, I anticipated orjson to be compatible with asdict() and opened this issue as I thought it might be a bug.
But if the resulting output is different by design, please feel free to close. :slightly_smiling_face:
My specific case is due to the target schema.
Where the InitVar is used as a key for the relevant object instead of being an attribute as well as a part of a computed property reference. i.e.:
@dataclass
class SecurityGroup:
resource_name: InitVar[str]
to_port: int
from_port: int
web_sg = SecurityGroup('webserver_ssl', from_port=443, to_port=443)
web_srv = Instance('webserver', security_group=web_sg.id)
{
"resources": {
"security_group": {
"webserver_ssl": {
"from_port": 443,
"to_port": 443
},
"database_pgsql": {
"from_port": 5432,
"to_port": 5432
}
},
"instance": {
"webserver": {
"name": "Backend",
"security_group": "security_group.webserver_ssl.id"
},
"database": {
"name": "PostgreSQL",
"security_group": "security_group.database_pgsql.id"
}
}
}
}
For now, I am calling asdict() and passing the result to orjson, this is obviously not as performant as using orjson directly, but still has significant benefit.
I'm commenting here because I'm facing the same issue as @beanaroo
I've indeed a (young) library of json (de)serialization with dataclasses, and I use a hidden attribute to track fields set in __init__
(in order to not serialize their default value) ; this attribute being a set
, it makes orjson raise an error.
I think with the distinction that dataclass are objects rather than dict it makes sense to not serialize attributes that being with a leading underscore--those are "private" in Python.
@ijl I agree with your sentence and I'm all in favor of this; if you want, I can do a PR.
@wyfo ok, it does that in 3.0.2.
I just got hit by this "fix". I was very confused when one of my keys got swallowed with no further comment.
orjson.dumps() does not serialize dataclasses.dataclass attributes that begin with a leading underscore, e.g., _attr. This is because of the Python idiom that a leading underscores marks an attribute as "private."
The thing is, I have underscored properties in my dataclasses which I want to get serialized. I'll describe my use case below but here are some general arguments first:
dataclasses.asdict()
representation. This is not explicitly stated by the README but the comparison for benchmarking purpose kind of implies it.orjson.dumps(dict(_key=1)) == b'{"_key":1}'
).My use case is message passing between different processes (including different languages). I'm using a _type
property to identify the message type so it can be deserialized to the correct type on the other end. Here's the relevant part of my base class:
@dataclass(frozen=True)
class Message(ABC):
_type: str = field(init=False, repr=False, compare=False)
types = dict()
def __post_init__(self):
super().__setattr__('_type', self.__class__.__name__)
def __init_subclass__(cls):
Message.types[cls.__name__] = cls
@staticmethod
def from_dict(data):
return Message.types[data.pop("_type")](**data)
Note, that Message.from_dict
does the inverse of dataclasses.asdict
assuming the _type
.
In short, please serialize underscore keys again or at least make it configurable (without having to overwrite the default serializer).
Whatever your decision, thanks for this tool and the work you're doing!
edit: If you'd prefer a separate issue for this just let me know. ;)
Thank you for the amazing project. I'm working on a dataclasses based library that processes very large and nested json output and I have seen amazing results.
While working on porting, I've noticed an inconsistency, in comparison to using
asdict
, where non-field attributes are being serialized.Is this expected behaviour? It makes using dataclass fields for schema validation quite a bit harder.