Closed Validark closed 1 year ago
Just to add some context. This is a fix for a mis-translation of the c++ code. The original is here and here
from discussion on discord:
The old is_ascii was only returning true for chunks of all 0's
_mm_movemask_epi8(*this) == 0;
This function does this: "Create mask from the most significant bit of each 8-bit element in a, and store the result in dst."
Thanks for finding and fixing this :+1: . Just going to give it a quick spin on my machine. Planning to merge afterward.
ran a quick benchmark on test/twitter.json which contains a lot of unicode. seems like a slight speedup. i ran a few times and there were runs that showed a 5% slowdown. but this seems like a solid win given that it improves correctness and doesn't hurt or perhaps improves performance.
$ sudo ../poop/zig-out/bin/poop -d 10000 "zig-out/bin/simdjzonOld $testfile" "zig-out/bin/simdjzon $testfile"
Benchmark 1 (7455 runs): zig-out/bin/simdjzonOld test/twitter.json
measurement mean ± σ min … max outliers delta
wall_time 1.30ms ± 354us 788us … 2.70ms 13 ( 0%) 0%
peak_rss 5.18MB ± 1.67MB 3.03MB … 7.22MB 0 ( 0%) 0%
cpu_cycles 1.52M ± 71.0K 1.45M … 2.46M 970 (13%) 0%
instructions 4.79M ± 23.0 4.79M … 4.79M 1685 (23%) 0%
cache_references 78.1K ± 2.11K 74.0K … 144K 353 ( 5%) 0%
cache_misses 2.10K ± 124 1.74K … 3.36K 36 ( 0%) 0%
branch_misses 3.47K ± 249 2.98K … 5.26K 448 ( 6%) 0%
Benchmark 2 (7659 runs): zig-out/bin/simdjzon test/twitter.json
measurement mean ± σ min … max outliers delta
wall_time 1.27ms ± 362us 803us … 2.59ms 3 ( 0%) ⚡- 2.5% ± 0.9%
peak_rss 5.17MB ± 1.66MB 3.03MB … 7.22MB 0 ( 0%) - 0.2% ± 1.0%
cpu_cycles 1.49M ± 58.2K 1.42M … 2.31M 831 (11%) ⚡- 2.2% ± 0.1%
instructions 4.40M ± 22.9 4.40M … 4.41M 1723 (22%) ⚡- 8.0% ± 0.0%
cache_references 78.1K ± 1.41K 74.1K … 129K 301 ( 4%) - 0.0% ± 0.1%
cache_misses 2.15K ± 127 1.72K … 3.08K 40 ( 1%) 💩+ 2.1% ± 0.2%
branch_misses 3.70K ± 259 3.17K … 5.44K 388 ( 5%) 💩+ 6.8% ± 0.2%
Thanks again! :heart:
One thing you could try is removing the is_ascii check and unconditionally validating the entire document as utf8. That might get you a speedup when there are a lot of non-ascii characters. The speedup given by this change is when the is_ascii check is almost always true.
I tried a couple of methods, and found this to be the fastest version.