Closed fmoletta closed 4 months ago
**Hyper Thereading Benchmark results**
hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
Time (mean ± σ): 27.131 s ± 0.022 s [User: 26.384 s, System: 0.745 s]
Range (min … max): 27.115 s … 27.146 s 2 runs
Benchmark 2: hyper_threading_pr threads: 1
Time (mean ± σ): 27.143 s ± 0.006 s [User: 26.317 s, System: 0.824 s]
Range (min … max): 27.138 s … 27.147 s 2 runs
Summary
'hyper_threading_main threads: 1' ran
1.00 ± 0.00 times faster than 'hyper_threading_pr threads: 1'
hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
Time (mean ± σ): 14.621 s ± 0.023 s [User: 26.954 s, System: 0.828 s]
Range (min … max): 14.605 s … 14.637 s 2 runs
Benchmark 2: hyper_threading_pr threads: 2
Time (mean ± σ): 14.584 s ± 0.029 s [User: 26.838 s, System: 0.798 s]
Range (min … max): 14.563 s … 14.604 s 2 runs
Summary
'hyper_threading_pr threads: 2' ran
1.00 ± 0.00 times faster than 'hyper_threading_main threads: 2'
hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
Time (mean ± σ): 11.083 s ± 0.094 s [User: 38.612 s, System: 0.992 s]
Range (min … max): 11.016 s … 11.150 s 2 runs
Benchmark 2: hyper_threading_pr threads: 4
Time (mean ± σ): 11.138 s ± 0.324 s [User: 38.637 s, System: 0.979 s]
Range (min … max): 10.908 s … 11.367 s 2 runs
Summary
'hyper_threading_main threads: 4' ran
1.00 ± 0.03 times faster than 'hyper_threading_pr threads: 4'
hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
Time (mean ± σ): 10.646 s ± 0.063 s [User: 39.193 s, System: 0.936 s]
Range (min … max): 10.602 s … 10.691 s 2 runs
Benchmark 2: hyper_threading_pr threads: 6
Time (mean ± σ): 10.687 s ± 0.096 s [User: 38.709 s, System: 1.004 s]
Range (min … max): 10.619 s … 10.755 s 2 runs
Summary
'hyper_threading_main threads: 6' ran
1.00 ± 0.01 times faster than 'hyper_threading_pr threads: 6'
hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
Time (mean ± σ): 10.658 s ± 0.038 s [User: 39.251 s, System: 1.015 s]
Range (min … max): 10.631 s … 10.685 s 2 runs
Benchmark 2: hyper_threading_pr threads: 8
Time (mean ± σ): 10.585 s ± 0.145 s [User: 39.384 s, System: 0.952 s]
Range (min … max): 10.483 s … 10.688 s 2 runs
Summary
'hyper_threading_pr threads: 8' ran
1.01 ± 0.01 times faster than 'hyper_threading_main threads: 8'
hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
Time (mean ± σ): 10.476 s ± 0.082 s [User: 39.713 s, System: 1.023 s]
Range (min … max): 10.418 s … 10.533 s 2 runs
Benchmark 2: hyper_threading_pr threads: 16
Time (mean ± σ): 10.550 s ± 0.236 s [User: 39.450 s, System: 1.050 s]
Range (min … max): 10.384 s … 10.717 s 2 runs
Summary
'hyper_threading_main threads: 16' ran
1.01 ± 0.02 times faster than 'hyper_threading_pr threads: 16'
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 94.76%. Comparing base (
5da72d9
) to head (17db721
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Benchmark Results for unmodified programs :rocket:
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base big_factorial |
2.067 ± 0.089 | 2.014 | 2.290 | 1.02 ± 0.05 |
head big_factorial |
2.030 ± 0.024 | 2.007 | 2.091 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base big_fibonacci |
1.993 ± 0.028 | 1.966 | 2.060 | 1.01 ± 0.02 |
head big_fibonacci |
1.982 ± 0.014 | 1.957 | 2.004 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base blake2s_integration_benchmark |
7.558 ± 0.091 | 7.452 | 7.727 | 1.00 |
head blake2s_integration_benchmark |
7.616 ± 0.175 | 7.448 | 8.070 | 1.01 ± 0.03 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base compare_arrays_200000 |
2.100 ± 0.023 | 2.074 | 2.160 | 1.00 ± 0.01 |
head compare_arrays_200000 |
2.099 ± 0.018 | 2.078 | 2.135 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base dict_integration_benchmark |
1.421 ± 0.029 | 1.396 | 1.492 | 1.03 ± 0.02 |
head dict_integration_benchmark |
1.376 ± 0.011 | 1.364 | 1.402 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base field_arithmetic_get_square_benchmark |
1.290 ± 0.008 | 1.280 | 1.304 | 1.00 ± 0.01 |
head field_arithmetic_get_square_benchmark |
1.284 ± 0.014 | 1.266 | 1.320 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base integration_builtins |
7.622 ± 0.177 | 7.481 | 8.099 | 1.00 ± 0.03 |
head integration_builtins |
7.618 ± 0.112 | 7.473 | 7.822 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base keccak_integration_benchmark |
8.175 ± 0.696 | 7.747 | 9.975 | 1.04 ± 0.09 |
head keccak_integration_benchmark |
7.826 ± 0.093 | 7.707 | 7.994 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base linear_search |
2.062 ± 0.016 | 2.036 | 2.082 | 1.01 ± 0.01 |
head linear_search |
2.051 ± 0.012 | 2.037 | 2.075 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base math_cmp_and_pow_integration_benchmark |
1.684 ± 0.007 | 1.676 | 1.696 | 1.00 ± 0.01 |
head math_cmp_and_pow_integration_benchmark |
1.681 ± 0.021 | 1.661 | 1.732 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base math_integration_benchmark |
1.598 ± 0.013 | 1.591 | 1.634 | 1.01 ± 0.02 |
head math_integration_benchmark |
1.582 ± 0.021 | 1.568 | 1.638 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base memory_integration_benchmark |
1.198 ± 0.037 | 1.172 | 1.280 | 1.01 ± 0.03 |
head memory_integration_benchmark |
1.183 ± 0.017 | 1.172 | 1.230 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base operations_with_data_structures_benchmarks |
1.804 ± 0.010 | 1.790 | 1.828 | 1.00 ± 0.01 |
head operations_with_data_structures_benchmarks |
1.796 ± 0.011 | 1.788 | 1.827 | 1.00 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
base pedersen |
513.5 ± 4.1 | 510.0 | 524.6 | 1.00 ± 0.01 |
head pedersen |
513.5 ± 4.4 | 510.2 | 524.5 | 1.00 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
base poseidon_integration_benchmark |
949.9 ± 21.1 | 939.3 | 1008.5 | 1.01 ± 0.02 |
head poseidon_integration_benchmark |
940.9 ± 3.3 | 937.3 | 946.8 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base secp_integration_benchmark |
1.854 ± 0.009 | 1.843 | 1.877 | 1.00 |
head secp_integration_benchmark |
1.864 ± 0.010 | 1.847 | 1.880 | 1.01 ± 0.01 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
base set_integration_benchmark |
640.3 ± 6.1 | 635.1 | 653.8 | 1.00 ± 0.01 |
head set_integration_benchmark |
637.6 ± 2.2 | 633.2 | 640.7 | 1.00 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
base uint256_integration_benchmark |
4.229 ± 0.065 | 4.147 | 4.343 | 1.00 |
head uint256_integration_benchmark |
4.243 ± 0.273 | 4.108 | 5.011 | 1.00 ± 0.07 |
PR #1673 unified most implementations of
final_stack
into a single implementation, but overlooked a small detail that made the implementation different forSegmentArena
, introducing a bug. This PR fixes this issue by accounting forSegmentArena
's difference in the unifiedfinal_stack
implementation.