jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.3k stars 67 forks source link

Is encoding with `msgspec` faster than to `json.dumps(dataclass.asdict)`? #589

Closed remiconnesson closed 10 months ago

remiconnesson commented 10 months ago

Question

Hello sorry if the question is dumb as json.dumps(dataclass.asdict) does not perform any validation and msgspec seems to do.

But I'm still wondering, if msgspec would be faster in encoding without validation as maybe you are using a different mechanism to encode in JSON?

Thank you very much

CamDavidsonPilon commented 10 months ago

Hi @remiconnesson, msgspec doesn't validate on encode (the equivalent of dumps), only on decode (loads): https://jcristharif.com/msgspec/structs.html#type-validation

remiconnesson commented 10 months ago

Thank you very much for the clarification @CamDavidsonPilon and pointing out to this page

from the initial paragraph:

Structs are the preferred way of defining structured data types in msgspec. They’re written in C and are quite speedy and lightweight (measurably faster to create/compare/encode/decode than similar options like dataclasses, attrs, or pydantic). They’re great for representing structured data both for serialization and for use in an application.

I take it as a yes for: "Is encoding with msgspec faster than to json.dumps(dataclass.asdict)?":)

Thank you

CamDavidsonPilon commented 10 months ago

Yes, you can have look at the benchmarks to get a good idea: https://jcristharif.com/msgspec/benchmarks.html#benchmark-encoding-decoding

But also you can run some perf tests locally to see as well!

jcrist commented 10 months ago

As mentioned above, encoding dataclasses with msgspec is much faster than doing json.dumps(dataclass.asdict(obj)). When you have questions like this, I recommend doing benchmarks yourself locally to measure things yourself. Here's a quick benchmark to get you started:

In [1]: import msgspec, dataclasses, json

In [2]: @dataclasses.dataclass
   ...: class Point:
   ...:     x: int
   ...:     y: int
   ...:     z: int
   ...: 

In [3]: p = Point(1, 2, 3)

In [4]: %timeit msgspec.json.encode(p)
144 ns ± 0.321 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: %timeit json.dumps(dataclasses.asdict(p))
4.58 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: print(f"msgspec encoded dataclasses to JSON {4.58 / .144:.1f}x faster")
msgspec encoded dataclasses to JSON 31.8x faster