tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.2k stars 236 forks source link

`memchr` vs `stringzilla` performance comparison #718

Open RoloEdits opened 8 months ago

RoloEdits commented 8 months ago

Came across a benchmarking comparison between the two.

Notably the results:

ASCII ⏩ ASCII ⏪ UTF8 ⏩ UTF8 ⏪
Intel:
memchr 5.89 GB/s 1.08 GB/s 8.73 GB/s 3.35 GB/s
stringzilla 8.37 GB/s 8.21 GB/s 11.21 GB/s 11.20 GB/s
Arm:
memchr 6.38 GB/s 1.12 GB/s 13.20 GB/s 3.56 GB/s
stringzilla 6.56 GB/s 5.56 GB/s 9.41 GB/s 8.17 GB/s
Average 1.2x faster 6.2x faster - 2.8x faster

Its noted that that rust crate doesn't cover the full c++ api, but that it is planned to do so eventually. In the interest of performance, I thought I would share the benchmark results so informed exploring can be done if desired, if the potential gains match up with any wins for one crate or the other.

Mingun commented 8 months ago

Very interesting! If I understand correctly, this is results of benchmarks of crates themselves, you didn't integrate stringzilla to quick-xml, right? I'm always open in performance improvements so if you will you can create a PR with a replacement so everyone can experiment with such change. Note, however, that these results probably from searching small patterns in long strings. XML usually has a different access pattern -- many searches of small patterns in small strings. As you can see from quick-xml self benchmarks, the maybe_xml is even faster than quick-xml in most cases, although it does not use any SIMD libs. quick-xml wins only on very long XMLs (several megabytes) which, I think, usually a rare case.

dralley commented 8 months ago

BurntSushi provided a response on Reddit, it seems like the benchmarks are a bit misleading, there are some circumstances in which StringZilla is faster but on average it seems to be slower.

https://www.reddit.com/r/rust/comments/1ayngf6/memchr_vs_stringzilla_benchmarks_up_to_7x/