lambdaclass / cairo-vm

cairo-vm is a Rust implementation of the Cairo VM. Cairo (CPU Algebraic Intermediate Representation) is a programming language for writing provable programs, where one party can prove to another that a certain computation was executed correctly without the need for this party to re-execute the same program.
https://lambdaclass.github.io/cairo-vm
Apache License 2.0
514 stars 144 forks source link

perf: Change `CairoRunError::VmException` to `Box<VmException>` #1756

Closed fmoletta closed 4 months ago

fmoletta commented 4 months ago

PR #1720 Added a small error variant to the CairoRunError which brought a huge performance regression. This is due to the VmException variant having a big size, making all other variants equally as big. This PR solves this issue by wrapping the VmException contained in its corresponding variant, and adds a test to ensure that the size of CairoRunError doesn't surpass 32 bytes

github-actions[bot] commented 4 months ago
**Hyper Thereading Benchmark results**

hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     27.212 s ±  0.002 s    [User: 26.331 s, System: 0.879 s]
  Range (min … max):   27.211 s … 27.214 s    2 runs

Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     26.887 s ±  0.077 s    [User: 26.107 s, System: 0.778 s]
  Range (min … max):   26.833 s … 26.942 s    2 runs

Summary
  'hyper_threading_pr threads: 1' ran
    1.01 ± 0.00 times faster than 'hyper_threading_main threads: 1'

hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     14.597 s ±  0.013 s    [User: 26.937 s, System: 0.829 s]
  Range (min … max):   14.587 s … 14.606 s    2 runs

Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     14.787 s ±  0.017 s    [User: 26.764 s, System: 0.793 s]
  Range (min … max):   14.776 s … 14.799 s    2 runs

Summary
  'hyper_threading_main threads: 2' ran
    1.01 ± 0.00 times faster than 'hyper_threading_pr threads: 2'

hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):     11.112 s ±  0.007 s    [User: 38.620 s, System: 0.992 s]
  Range (min … max):   11.107 s … 11.117 s    2 runs

Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):     10.602 s ±  0.413 s    [User: 38.032 s, System: 0.933 s]
  Range (min … max):   10.310 s … 10.893 s    2 runs

Summary
  'hyper_threading_pr threads: 4' ran
    1.05 ± 0.04 times faster than 'hyper_threading_main threads: 4'

hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):     10.734 s ±  0.229 s    [User: 39.018 s, System: 0.998 s]
  Range (min … max):   10.572 s … 10.896 s    2 runs

Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):     10.630 s ±  0.346 s    [User: 38.248 s, System: 0.970 s]
  Range (min … max):   10.385 s … 10.875 s    2 runs

Summary
  'hyper_threading_pr threads: 6' ran
    1.01 ± 0.04 times faster than 'hyper_threading_main threads: 6'

hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):     10.591 s ±  0.118 s    [User: 39.393 s, System: 1.006 s]
  Range (min … max):   10.508 s … 10.674 s    2 runs

Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):     10.379 s ±  0.040 s    [User: 38.488 s, System: 1.041 s]
  Range (min … max):   10.351 s … 10.407 s    2 runs

Summary
  'hyper_threading_pr threads: 8' ran
    1.02 ± 0.01 times faster than 'hyper_threading_main threads: 8'

hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):     10.662 s ±  0.041 s    [User: 39.583 s, System: 1.014 s]
  Range (min … max):   10.633 s … 10.691 s    2 runs

Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):     10.320 s ±  0.090 s    [User: 38.833 s, System: 1.086 s]
  Range (min … max):   10.257 s … 10.384 s    2 runs

Summary
  'hyper_threading_pr threads: 16' ran
    1.03 ± 0.01 times faster than 'hyper_threading_main threads: 16'
github-actions[bot] commented 4 months ago

Benchmark Results for unmodified programs :rocket:

Command Mean [s] Min [s] Max [s] Relative
base big_factorial 2.041 ± 0.010 2.030 2.060 1.00
head big_factorial 2.058 ± 0.059 2.027 2.225 1.01 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
base big_fibonacci 1.993 ± 0.014 1.976 2.020 1.00
head big_fibonacci 2.000 ± 0.018 1.979 2.031 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 7.597 ± 0.076 7.490 7.721 1.00
head blake2s_integration_benchmark 7.619 ± 0.151 7.457 7.952 1.00 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 2.120 ± 0.029 2.094 2.175 1.01 ± 0.02
head compare_arrays_200000 2.107 ± 0.018 2.086 2.138 1.00
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 1.422 ± 0.020 1.407 1.478 1.01 ± 0.02
head dict_integration_benchmark 1.402 ± 0.006 1.394 1.414 1.00
Command Mean [s] Min [s] Max [s] Relative
base field_arithmetic_get_square_benchmark 1.291 ± 0.017 1.276 1.336 1.00 ± 0.02
head field_arithmetic_get_square_benchmark 1.289 ± 0.013 1.275 1.316 1.00
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 7.680 ± 0.134 7.520 7.985 1.01 ± 0.02
head integration_builtins 7.624 ± 0.077 7.481 7.722 1.00
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 7.873 ± 0.188 7.725 8.367 1.00
head keccak_integration_benchmark 7.895 ± 0.100 7.710 8.026 1.00 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
base linear_search 2.065 ± 0.011 2.051 2.086 1.00
head linear_search 2.087 ± 0.031 2.053 2.144 1.01 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 1.693 ± 0.006 1.681 1.701 1.01 ± 0.01
head math_cmp_and_pow_integration_benchmark 1.675 ± 0.010 1.663 1.694 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 1.598 ± 0.019 1.584 1.650 1.01 ± 0.02
head math_integration_benchmark 1.585 ± 0.016 1.563 1.621 1.00
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 1.191 ± 0.004 1.184 1.200 1.00 ± 0.01
head memory_integration_benchmark 1.188 ± 0.008 1.175 1.196 1.00
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 1.828 ± 0.043 1.799 1.945 1.02 ± 0.02
head operations_with_data_structures_benchmarks 1.798 ± 0.006 1.790 1.809 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base pedersen 524.8 ± 4.8 519.0 535.5 1.02 ± 0.01
head pedersen 514.6 ± 5.7 511.8 530.7 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base poseidon_integration_benchmark 964.9 ± 4.3 957.5 971.3 1.00
head poseidon_integration_benchmark 965.9 ± 6.2 959.2 979.2 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 1.857 ± 0.020 1.838 1.898 1.01 ± 0.01
head secp_integration_benchmark 1.848 ± 0.015 1.830 1.873 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base set_integration_benchmark 643.8 ± 5.4 639.3 657.7 1.00
head set_integration_benchmark 660.0 ± 2.2 658.2 665.7 1.03 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 4.206 ± 0.037 4.168 4.291 1.00
head uint256_integration_benchmark 4.245 ± 0.064 4.140 4.338 1.01 ± 0.02