memorysafety / zlib-rs

A safer zlib
zlib License
75 stars 6 forks source link

Refactor deflate `BitWriter` #93

Closed folkertdev closed 1 month ago

folkertdev commented 2 months ago

the goal here is to prevent the cloning or moving of fields, e.g. this pattern

let mut tmp = Default::default();
std::mem::swap(&mut tmp, &mut state.field);
foobar(&tmp);
std::mem::swap(&mut tmp, &mut state.field);

this PR removes this sort of cloning in 2 places.

folkertdev commented 1 month ago

this gives a fairly consistent improvement in instructions executed, cache misses seems better but has a lot of variance.

Benchmark 1 (7 runs): ./compress-baseline 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           714ms ± 79.2ms     633ms …  856ms          0 ( 0%)        0%
  peak_rss           24.5MB ± 64.9KB    24.4MB … 24.6MB          0 ( 0%)        0%
  cpu_cycles         2.47G  ± 93.0M     2.35G  … 2.59G           0 ( 0%)        0%
  instructions       3.73G  ±  707      3.73G  … 3.73G           0 ( 0%)        0%
  cache_references   26.0M  ± 7.00M     16.4M  … 34.1M           0 ( 0%)        0%
  cache_misses       1.80M  ± 1.20M      578K  … 3.95M           0 ( 0%)        0%
  branch_misses      21.7M  ±  109K     21.5M  … 21.8M           0 ( 0%)        0%
Benchmark 2 (8 runs): target/release/examples/compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           647ms ± 37.0ms     610ms …  722ms          0 ( 0%)          -  9.3% ±  9.4%
  peak_rss           24.6MB ±  103KB    24.4MB … 24.7MB          0 ( 0%)          +  0.3% ±  0.4%
  cpu_cycles         2.39G  ± 64.2M     2.33G  … 2.52G           0 ( 0%)          -  3.4% ±  3.6%
  instructions       3.65G  ±  404      3.65G  … 3.65G           1 (13%)        ⚡-  2.1% ±  0.0%
  cache_references   20.2M  ± 8.67M     11.3M  … 34.9M           0 ( 0%)          - 22.2% ± 34.2%
  cache_misses        657K  ±  171K      498K  …  967K           0 ( 0%)        ⚡- 63.5% ± 51.1%
  branch_misses      21.6M  ± 57.8K     21.5M  … 21.7M           3 (38%)          -  0.4% ±  0.4%