trifectatechfoundation / zlib-rs

A safer zlib
zlib License
147 stars 15 forks source link

use wider loads/stores in `Writer` on aarch64 #196

Closed folkertdev closed 2 months ago

folkertdev commented 2 months ago

At least on the raspberri pi, this brings our performance on-par with zlib-ng

Benchmark 1 (48 runs): ./target/release/examples/blogpost-uncompress ng silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105ms ± 6.80ms     103ms …  151ms          7 (15%)        0%
  peak_rss           23.9MB ± 47.5KB    23.8MB … 23.9MB          0 ( 0%)        0%
  cpu_cycles          121M  ± 1.54M      119M  …  124M          10 (21%)        0%
  instructions        148M  ±  359       148M  …  148M           0 ( 0%)        0%
  cache_references   33.9M  ± 9.34K     33.9M  … 33.9M           8 (17%)        0%
  cache_misses        864K  ±  109K      787K  … 1.10M           9 (19%)        0%
  branch_misses      1.14M  ± 1.05K     1.14M  … 1.15M           1 ( 2%)        0%
Benchmark 2 (47 runs): ./target/release/examples/blogpost-uncompress rs silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           107ms ± 5.01ms     105ms …  140ms          1 ( 2%)          +  1.9% ±  2.3%
  peak_rss           23.9MB ± 44.2KB    23.8MB … 23.9MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          124M  ± 1.62M      123M  …  128M           0 ( 0%)        💩+  3.3% ±  0.5%
  instructions        190M  ±  349       190M  …  190M           0 ( 0%)        💩+ 28.8% ±  0.0%
  cache_references   31.7M  ± 4.20K     31.7M  … 31.7M           0 ( 0%)        ⚡-  6.4% ±  0.0%
  cache_misses        913K  ±  119K      822K  … 1.16M           0 ( 0%)          +  5.7% ±  5.4%
  branch_misses      1.17M  ± 1.72K     1.17M  … 1.18M           1 ( 2%)        💩+  2.8% ±  0.1%