simd-lite / simd-json

Rust port of simdjson
Apache License 2.0
1.05k stars 84 forks source link

Performance between 'Values API' and 'Serde Compatible API'. #370

Open ZhaiMo15 opened 4 months ago

ZhaiMo15 commented 4 months ago

I noticed that simd-json offers two main entry points for usage: 'Values API' and 'Serde Compatible API'. I ran benches/parse.rs to test the performance. I added code below to test simd_json::serde::from_slice:

fn simd_from_slice(data: &mut [u8]) {
    let _: serde_json::Value = simd_json::serde::from_slice(data).unwrap();
}

group.bench_with_input("simd_json::serde::from_slice", &vec, |b, data| {
    b.iter_batched(
        || data.clone(),
        |mut bytes| simd_from_slice(&mut bytes),
        BatchSize::SmallInput,
    )
});
Here's the result: Throughput(MiB/s) simd_json:: to_borrowed_value simd_json:: to_borrowed_value_with_buffers simd_json:: to_owned_value simd_json:: serde::from_slice serde_json:: from_slice
apache_builds 378.61 323.86 170.99 164.81 140.08
event_stacktrace_10kb 983.51 1070.9 762.83 709.25 501.10
github_events 455.06 442.80 226.45 178.68 141.91
canada 142.38 164.17 146.48 114.10 152.69
citm_catalog 312.69 340.59 234.10 237.55 225.40
log 403.55 467.40 196.92 155.88 119.01
twitter 367.24 372.41 207.64 156.72 120.17

The 'Values API' is mostly better than serde except canada(146.48 vs 152.69). However, the performance of 'Serde Compatible API' seems not that good, the result of canada cannot be acceptable(114.10 vs 152.69). I'd like to use simdjson to increase the performance, so it's better to use 'Values API'? And if my data is similar to canada, it's better not to use simdjson?

Licenser commented 4 months ago

Hi,

sorry for the late reply I'm just finishing some vacationing and was mostly away from the computer.

That makes a good bit of sense, the serde mapping code is quite complex and by that expensive to execute - irespective of the decoder (serde-json, simd-json, serde-yaml ... etc).

By contrast the value API is kept as simple as possible so the overhead is just lower. In some cases the extra cost of allowing abritary values is less then the extra cost of serde.

There is a 3rd api, the simd-json-derive api that allows to decide directly into structs w/o the value API, that will be faster still but is less flexible.

Last but not least, I want to point out that this isn't serde being bad, or not well written. serde is made to allow nearly arbritary format translations and is extremely powerful that way - it does this extremely well, but such power usually comes at a cost. the simd-json-derive macros are extremly specific so can take a lot of shortcuts serde simply can't that's where the performance benefit can be gained.

PSeitz commented 1 month ago

I implemented the ValueBuilder trait for serde_json_borrow::Value. In this bench it is slightly slower than the simd_json serde variant in most benches. I think it should slightly faster than BorrowedValue, as the datastructures are simpler. I didn't profile, but could be related to missing inlines, as this is cross-crates.

There's also a dependency to halfbrown::Hashmap on the simd_json::value::deserialize API. It would be good to reexport or switch to something more generic like an Iterator.

The branch is here https://github.com/PSeitz/serde_json_borrow/tree/simd_json_value_builder.

Library Dataset Avg Speed
serde_json flat_json 137.96 MiB/s
serde_json_borrow flat_json 208.32 MiB/s
simd_serde_json_borrow flat_json 122.62 MiB/s
simd_serde_json_borrow_value_builder flat_json 118.30 MiB/s
simd_json_BorrowedValue flat_json 134.90 MiB/s
serde_json hdfs 287.32 MiB/s
serde_json_borrow hdfs 389.00 MiB/s
simd_serde_json_borrow hdfs 263.32 MiB/s
simd_serde_json_borrow_value_builder hdfs 253.54 MiB/s
simd_json_BorrowedValue hdfs 288.18 MiB/s
serde_json hdfs_with_array 200.19 MiB/s
serde_json_borrow hdfs_with_array 283.97 MiB/s
simd_serde_json_borrow hdfs_with_array 163.14 MiB/s
simd_serde_json_borrow_value_builder hdfs_with_array 192.23 MiB/s
simd_json_BorrowedValue hdfs_with_array 209.96 MiB/s
serde_json wiki 446.33 MiB/s
serde_json_borrow wiki 488.47 MiB/s
simd_serde_json_borrow wiki 555.05 MiB/s
simd_serde_json_borrow_value_builder wiki 544.57 MiB/s
simd_json_BorrowedValue wiki 582.90 MiB/s
serde_json gh-archive 175.92 MiB/s
serde_json_borrow gh-archive 362.67 MiB/s
simd_serde_json_borrow gh-archive 343.65 MiB/s
simd_serde_json_borrow_value_builder gh-archive 328.93 MiB/s
simd_json_BorrowedValue gh-archive 397.70 MiB/s
Licenser commented 1 month ago

This is really awesome @PSeitz !