Closed folkertdev closed 1 month ago
this gives a fairly consistent improvement in instructions executed, cache misses seems better but has a lot of variance.
Benchmark 1 (7 runs): ./compress-baseline 9 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 714ms ± 79.2ms 633ms … 856ms 0 ( 0%) 0%
peak_rss 24.5MB ± 64.9KB 24.4MB … 24.6MB 0 ( 0%) 0%
cpu_cycles 2.47G ± 93.0M 2.35G … 2.59G 0 ( 0%) 0%
instructions 3.73G ± 707 3.73G … 3.73G 0 ( 0%) 0%
cache_references 26.0M ± 7.00M 16.4M … 34.1M 0 ( 0%) 0%
cache_misses 1.80M ± 1.20M 578K … 3.95M 0 ( 0%) 0%
branch_misses 21.7M ± 109K 21.5M … 21.8M 0 ( 0%) 0%
Benchmark 2 (8 runs): target/release/examples/compress 9 rs silesia-small.tar
measurement mean ± σ min … max outliers delta
wall_time 647ms ± 37.0ms 610ms … 722ms 0 ( 0%) - 9.3% ± 9.4%
peak_rss 24.6MB ± 103KB 24.4MB … 24.7MB 0 ( 0%) + 0.3% ± 0.4%
cpu_cycles 2.39G ± 64.2M 2.33G … 2.52G 0 ( 0%) - 3.4% ± 3.6%
instructions 3.65G ± 404 3.65G … 3.65G 1 (13%) ⚡- 2.1% ± 0.0%
cache_references 20.2M ± 8.67M 11.3M … 34.9M 0 ( 0%) - 22.2% ± 34.2%
cache_misses 657K ± 171K 498K … 967K 0 ( 0%) ⚡- 63.5% ± 51.1%
branch_misses 21.6M ± 57.8K 21.5M … 21.7M 3 (38%) - 0.4% ± 0.4%
the goal here is to prevent the cloning or moving of fields, e.g. this pattern
this PR removes this sort of cloning in 2 places.