trifectatechfoundation / zlib-rs

A safer zlib
zlib License
147 stars 15 forks source link

optimize the `medium` algorithm #223

Closed folkertdev closed 1 month ago

folkertdev commented 1 month ago

much better than before

Benchmark 1 (33 runs): ./compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           155ms ± 3.30ms     150ms …  164ms          1 ( 3%)        0%
  peak_rss           24.7MB ± 67.7KB    24.5MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          671M  ± 12.2M      657M  …  704M           1 ( 3%)        0%
  instructions       1.71G  ±  238      1.71G  … 1.71G           0 ( 0%)        0%
  cache_references   43.8M  ±  552K     42.9M  … 45.1M           2 ( 6%)        0%
  cache_misses       1.16M  ±  300K      787K  … 1.99M           1 ( 3%)        0%
  branch_misses      7.79M  ± 9.15K     7.78M  … 7.81M           0 ( 0%)        0%
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           140ms ± 1.83ms     138ms …  146ms          2 ( 6%)        ⚡-  9.8% ±  0.8%
  peak_rss           24.7MB ± 76.7KB    24.5MB … 24.8MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          616M  ± 5.98M      609M  …  636M           2 ( 6%)        ⚡-  8.1% ±  0.7%
  instructions       1.53G  ±  264      1.53G  … 1.53G           1 ( 3%)        ⚡- 10.6% ±  0.0%
  cache_references   43.9M  ±  524K     43.2M  … 45.3M           2 ( 6%)          +  0.3% ±  0.6%
  cache_misses       1.02M  ±  213K      744K  … 1.79M           1 ( 3%)        ⚡- 12.7% ± 10.7%
  branch_misses      7.79M  ± 5.34K     7.78M  … 7.80M           2 ( 6%)          +  0.0% ±  0.0%

but still a ways to go

Benchmark 1 (38 runs): target/release/examples/blogpost-compress 3 ng silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           131ms ± 1.58ms     129ms …  138ms          2 ( 5%)        0%
  peak_rss           24.7MB ± 58.5KB    24.6MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          575M  ± 6.40M      570M  …  601M           2 ( 5%)        0%
  instructions       1.30G  ±  239      1.30G  … 1.30G           0 ( 0%)        0%
  cache_references   41.2M  ±  513K     40.3M  … 42.5M           0 ( 0%)        0%
  cache_misses       1.31M  ±  314K      926K  … 2.50M           2 ( 5%)        0%
  branch_misses      7.70M  ± 5.63K     7.69M  … 7.72M           1 ( 3%)        0%
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           140ms ± 1.42ms     138ms …  145ms          4 (11%)        💩+  6.9% ±  0.5%
  peak_rss           24.7MB ± 62.7KB    24.6MB … 24.8MB          0 ( 0%)          -  0.2% ±  0.1%
  cpu_cycles          617M  ± 5.97M      610M  …  637M           1 ( 3%)        💩+  7.1% ±  0.5%
  instructions       1.53G  ±  561      1.53G  … 1.53G           3 ( 8%)        💩+ 17.2% ±  0.0%
  cache_references   44.1M  ±  481K     43.3M  … 45.1M           0 ( 0%)        💩+  7.1% ±  0.6%
  cache_misses       1.07M  ±  212K      790K  … 1.54M           0 ( 0%)        ⚡- 18.3% ±  9.6%
  branch_misses      7.79M  ± 7.97K     7.78M  … 7.82M           3 ( 8%)        💩+  1.2% ±  0.0%

but, most of these changes are actually beneficial for all compression levels, so at the higher levels we're doing really well

Benchmark 2 (24 runs): target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ± 3.67ms     211ms …  227ms          2 ( 8%)        ⚡-  2.8% ±  0.9%
  peak_rss           24.5MB ±  116KB    24.2MB … 24.6MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles          963M  ± 14.1M      949M  … 1.01G           1 ( 4%)        ⚡-  2.5% ±  0.7%
  instructions       1.93G  ±  364      1.93G  … 1.93G           2 ( 8%)        💩+ 17.7% ±  0.0%
  cache_references    105M  ± 1.10M      104M  …  108M           0 ( 0%)        💩+  3.3% ±  0.6%
  cache_misses       2.01M  ±  617K     1.36M  … 3.49M           0 ( 0%)          -  9.4% ± 13.8%
  branch_misses      9.25M  ± 9.71K     9.24M  … 9.27M           0 ( 0%)        💩+  4.1% ±  0.1%

Benchmark 2 (12 runs): target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           419ms ± 4.96ms     415ms …  428ms          0 ( 0%)        ⚡-  4.1% ±  0.7%
  peak_rss           24.4MB ± 81.5KB    24.2MB … 24.5MB          0 ( 0%)          -  0.1% ±  0.3%
  cpu_cycles         1.90G  ± 12.5M     1.89G  … 1.93G           1 ( 8%)        ⚡-  4.5% ±  0.4%
  instructions       3.18G  ±  398      3.18G  … 3.18G           0 ( 0%)        💩+ 12.7% ±  0.0%
  cache_references    195M  ±  955K      194M  …  198M           0 ( 0%)          +  1.1% ±  0.4%
  cache_misses       2.91M  ±  799K     1.80M  … 4.58M           0 ( 0%)          -  8.6% ± 18.4%
  branch_misses      19.1M  ± 48.8K     19.0M  … 19.2M           3 (25%)        ⚡-  8.3% ±  0.1%
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Files with missing lines Coverage Δ
zlib-rs/src/crc32.rs 96.02% <ø> (-0.87%) :arrow_down:
zlib-rs/src/deflate.rs 96.66% <100.00%> (+<0.01%) :arrow_up:
zlib-rs/src/deflate/algorithm/medium.rs 94.01% <100.00%> (-0.11%) :arrow_down:
zlib-rs/src/read_buf.rs 90.47% <100.00%> (+6.83%) :arrow_up:

... and 3 files with indirect coverage changes

bjorn3 commented 1 month ago

Opened https://github.com/trifectatechfoundation/zlib-rs/pull/224 against this PR to fix CI.