Open ZhaiMo15 opened 4 months ago
Hi,
sorry for the late reply I'm just finishing some vacationing and was mostly away from the computer.
That makes a good bit of sense, the serde mapping code is quite complex and by that expensive to execute - irespective of the decoder (serde-json, simd-json, serde-yaml ... etc).
By contrast the value API is kept as simple as possible so the overhead is just lower. In some cases the extra cost of allowing abritary values is less then the extra cost of serde.
There is a 3rd api, the simd-json-derive api that allows to decide directly into structs w/o the value API, that will be faster still but is less flexible.
Last but not least, I want to point out that this isn't serde being bad, or not well written. serde is made to allow nearly arbritary format translations and is extremely powerful that way - it does this extremely well, but such power usually comes at a cost. the simd-json-derive macros are extremly specific so can take a lot of shortcuts serde simply can't that's where the performance benefit can be gained.
I implemented the ValueBuilder
trait for serde_json_borrow::Value
.
In this bench it is slightly slower than the simd_json serde
variant in most benches.
I think it should slightly faster than BorrowedValue
, as the datastructures are simpler. I didn't profile, but could be related to missing inlines, as this is cross-crates.
There's also a dependency to halfbrown::Hashmap
on the simd_json::value::deserialize
API. It would be good to reexport or switch to something more generic like an Iterator.
The branch is here https://github.com/PSeitz/serde_json_borrow/tree/simd_json_value_builder.
Library | Dataset | Avg Speed |
---|---|---|
serde_json | flat_json | 137.96 MiB/s |
serde_json_borrow | flat_json | 208.32 MiB/s |
simd_serde_json_borrow | flat_json | 122.62 MiB/s |
simd_serde_json_borrow_value_builder | flat_json | 118.30 MiB/s |
simd_json_BorrowedValue | flat_json | 134.90 MiB/s |
serde_json | hdfs | 287.32 MiB/s |
serde_json_borrow | hdfs | 389.00 MiB/s |
simd_serde_json_borrow | hdfs | 263.32 MiB/s |
simd_serde_json_borrow_value_builder | hdfs | 253.54 MiB/s |
simd_json_BorrowedValue | hdfs | 288.18 MiB/s |
serde_json | hdfs_with_array | 200.19 MiB/s |
serde_json_borrow | hdfs_with_array | 283.97 MiB/s |
simd_serde_json_borrow | hdfs_with_array | 163.14 MiB/s |
simd_serde_json_borrow_value_builder | hdfs_with_array | 192.23 MiB/s |
simd_json_BorrowedValue | hdfs_with_array | 209.96 MiB/s |
serde_json | wiki | 446.33 MiB/s |
serde_json_borrow | wiki | 488.47 MiB/s |
simd_serde_json_borrow | wiki | 555.05 MiB/s |
simd_serde_json_borrow_value_builder | wiki | 544.57 MiB/s |
simd_json_BorrowedValue | wiki | 582.90 MiB/s |
serde_json | gh-archive | 175.92 MiB/s |
serde_json_borrow | gh-archive | 362.67 MiB/s |
simd_serde_json_borrow | gh-archive | 343.65 MiB/s |
simd_serde_json_borrow_value_builder | gh-archive | 328.93 MiB/s |
simd_json_BorrowedValue | gh-archive | 397.70 MiB/s |
This is really awesome @PSeitz !
I noticed that simd-json offers two main entry points for usage: 'Values API' and 'Serde Compatible API'. I ran benches/parse.rs to test the performance. I added code below to test simd_json::serde::from_slice:
The 'Values API' is mostly better than serde except canada(146.48 vs 152.69). However, the performance of 'Serde Compatible API' seems not that good, the result of canada cannot be acceptable(114.10 vs 152.69). I'd like to use simdjson to increase the performance, so it's better to use 'Values API'? And if my data is similar to canada, it's better not to use simdjson?