trifectatechfoundation / zlib-rs

A safer zlib
zlib License
148 stars 15 forks source link

unroll `copy_chunk_unchecked` #185

Closed folkertdev closed 2 months ago

folkertdev commented 2 months ago

by one iteration for better branch prediction

Benchmark 1 (152 runs): ./uncompress-baseline rs-chunked 15 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          32.9ms ±  359us    32.3ms … 34.7ms          2 ( 1%)        0%
  peak_rss           24.1MB ± 60.8KB    24.0MB … 24.1MB          0 ( 0%)        0%
  cpu_cycles         98.2M  ±  406K     97.9M  …  101M           7 ( 5%)        0%
  instructions        272M  ±  279       272M  …  272M           1 ( 1%)        0%
  cache_references   2.11M  ± 43.4K     2.03M  … 2.23M           0 ( 0%)        0%
  cache_misses       56.6K  ± 3.13K     50.5K  … 75.6K           2 ( 1%)        0%
  branch_misses      1.20M  ±  764      1.19M  … 1.20M           6 ( 4%)        0%
Benchmark 2 (155 runs): target/release/examples/blogpost-uncompress rs-chunked 15 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          32.4ms ±  905us    31.4ms … 40.7ms          7 ( 5%)        ⚡-  1.8% ±  0.5%
  peak_rss           24.1MB ± 60.1KB    24.0MB … 24.1MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles         94.3M  ± 2.38M     93.5M  …  122M          12 ( 8%)        ⚡-  4.0% ±  0.4%
  instructions        263M  ±  346       263M  …  263M           3 ( 2%)        ⚡-  3.3% ±  0.0%
  cache_references   2.33M  ±  109K     2.22M  … 3.53M           3 ( 2%)        💩+ 10.2% ±  0.9%
  cache_misses       44.7K  ± 9.01K     32.3K  …  130K           9 ( 6%)        ⚡- 21.1% ±  2.7%
  branch_misses      1.20M  ± 2.23K     1.20M  … 1.22M           4 ( 3%)          +  0.3% ±  0.0%