memorysafety / zlib-rs

A safer zlib
zlib License
120 stars 9 forks source link

fix deflate performance #18

Open folkertdev opened 7 months ago

folkertdev commented 7 months ago

we are consistently ~10% slower than zlib-ng on deflate. This fluctuates with the different compression levels, but currently none of then are on-par with zlib-ng.

There is so far no obvious reason for this slowdown, so it's likely a "death by a thousand papercuts" sort of thing.

folkertdev commented 4 months ago

as a data point, commit 80481805d6a0bfffb4b9e29bb4ec6a363ec6a41d

Benchmark 1 (29 runs): cargo run --release --example compress 1 ng silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           175ms ± 8.58ms     168ms …  217ms          1 ( 3%)        0%
  peak_rss           26.8MB ± 87.3KB    26.6MB … 27.0MB          0 ( 0%)        0%
  cpu_cycles          515M  ± 6.88M      505M  …  529M           0 ( 0%)        0%
  instructions        744M  ± 35.3K      744M  …  744M           1 ( 3%)        0%
  cache_references   11.3M  ± 1.18M     9.08M  … 14.1M           0 ( 0%)        0%
  cache_misses       2.25M  ± 66.4K     2.12M  … 2.40M           1 ( 3%)        0%
  branch_misses      4.13M  ± 15.1K     4.10M  … 4.17M           0 ( 0%)        0%
Benchmark 2 (27 runs): cargo run --release --example compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           191ms ± 4.14ms     187ms …  207ms          1 ( 4%)        💩+  9.2% ±  2.1%
  peak_rss           26.7MB ± 93.5KB    26.6MB … 26.9MB          0 ( 0%)          -  0.3% ±  0.2%
  cpu_cycles          580M  ± 10.7M      570M  …  619M           1 ( 4%)        💩+ 12.7% ±  0.9%
  instructions        873M  ± 31.4K      873M  …  873M           0 ( 0%)        💩+ 17.3% ±  0.0%
  cache_references   11.4M  ± 1.16M     9.31M  … 13.6M           0 ( 0%)          +  0.9% ±  5.6%
  cache_misses       2.33M  ± 74.4K     2.22M  … 2.49M           0 ( 0%)        💩+  3.6% ±  1.7%
  branch_misses      4.39M  ± 42.1K     4.34M  … 4.52M           1 ( 4%)        💩+  6.2% ±  0.4%