lambdaclass / cairo-vm

cairo-vm is a Rust implementation of the Cairo VM. Cairo (CPU Algebraic Intermediate Representation) is a programming language for writing provable programs, where one party can prove to another that a certain computation was executed correctly without the need for this party to re-execute the same program.
https://lambdaclass.github.io/cairo-vm
Apache License 2.0
485 stars 132 forks source link

perf(wip): store instruction as single u64, perform bit operations on demand #1762

Open Oppen opened 1 month ago

Oppen commented 1 month ago

TITLE

Description

Description of the pull request changes and motivation.

Checklist

github-actions[bot] commented 1 month ago
**Hyper Thereading Benchmark results**

hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     27.338 s ±  0.030 s    [User: 26.571 s, System: 0.765 s]
  Range (min … max):   27.317 s … 27.359 s    2 runs

Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     27.036 s ±  0.065 s    [User: 26.238 s, System: 0.796 s]
  Range (min … max):   26.989 s … 27.082 s    2 runs

Summary
  'hyper_threading_pr threads: 1' ran
    1.01 ± 0.00 times faster than 'hyper_threading_main threads: 1'

hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     14.878 s ±  0.095 s    [User: 27.095 s, System: 0.778 s]
  Range (min … max):   14.810 s … 14.945 s    2 runs

Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     14.718 s ±  0.024 s    [User: 27.132 s, System: 0.751 s]
  Range (min … max):   14.701 s … 14.735 s    2 runs

Summary
  'hyper_threading_pr threads: 2' ran
    1.01 ± 0.01 times faster than 'hyper_threading_main threads: 2'

hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):     11.049 s ±  0.271 s    [User: 38.941 s, System: 0.912 s]
  Range (min … max):   10.858 s … 11.241 s    2 runs

Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):     11.263 s ±  0.312 s    [User: 39.751 s, System: 0.963 s]
  Range (min … max):   11.042 s … 11.483 s    2 runs

Summary
  'hyper_threading_main threads: 4' ran
    1.02 ± 0.04 times faster than 'hyper_threading_pr threads: 4'

hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):     10.721 s ±  0.120 s    [User: 38.872 s, System: 0.992 s]
  Range (min … max):   10.636 s … 10.805 s    2 runs

Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):     10.818 s ±  0.052 s    [User: 40.205 s, System: 0.965 s]
  Range (min … max):   10.781 s … 10.855 s    2 runs

Summary
  'hyper_threading_main threads: 6' ran
    1.01 ± 0.01 times faster than 'hyper_threading_pr threads: 6'

hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):     10.604 s ±  0.077 s    [User: 39.575 s, System: 1.018 s]
  Range (min … max):   10.549 s … 10.658 s    2 runs

Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):     11.086 s ±  0.393 s    [User: 40.414 s, System: 1.077 s]
  Range (min … max):   10.808 s … 11.363 s    2 runs

Summary
  'hyper_threading_main threads: 8' ran
    1.05 ± 0.04 times faster than 'hyper_threading_pr threads: 8'

hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):     10.633 s ±  0.110 s    [User: 39.899 s, System: 1.022 s]
  Range (min … max):   10.555 s … 10.711 s    2 runs

Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):     10.731 s ±  0.104 s    [User: 40.563 s, System: 1.105 s]
  Range (min … max):   10.657 s … 10.804 s    2 runs

Summary
  'hyper_threading_main threads: 16' ran
    1.01 ± 0.01 times faster than 'hyper_threading_pr threads: 16'
github-actions[bot] commented 1 month ago

Benchmark Results for unmodified programs :rocket:

Command Mean [s] Min [s] Max [s] Relative
base big_factorial 2.013 ± 0.015 1.994 2.045 1.00
head big_factorial 2.018 ± 0.024 1.989 2.061 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base big_fibonacci 1.991 ± 0.017 1.962 2.016 1.00
head big_fibonacci 2.009 ± 0.086 1.945 2.236 1.01 ± 0.04
Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 7.486 ± 0.088 7.404 7.695 1.00
head blake2s_integration_benchmark 7.596 ± 0.299 7.302 8.234 1.01 ± 0.04
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 2.098 ± 0.019 2.080 2.131 1.00
head compare_arrays_200000 2.104 ± 0.026 2.070 2.143 1.00 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 1.400 ± 0.013 1.388 1.434 1.00
head dict_integration_benchmark 1.416 ± 0.041 1.383 1.521 1.01 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
base field_arithmetic_get_square_benchmark 1.290 ± 0.049 1.266 1.427 1.01 ± 0.05
head field_arithmetic_get_square_benchmark 1.281 ± 0.034 1.251 1.361 1.00
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 7.442 ± 0.083 7.364 7.631 1.00
head integration_builtins 7.534 ± 0.172 7.335 7.856 1.01 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 7.712 ± 0.060 7.615 7.808 1.00
head keccak_integration_benchmark 7.808 ± 0.184 7.566 8.174 1.01 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
base linear_search 2.058 ± 0.017 2.042 2.086 1.00
head linear_search 2.061 ± 0.037 2.027 2.145 1.00 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 1.681 ± 0.007 1.670 1.690 1.00 ± 0.01
head math_cmp_and_pow_integration_benchmark 1.676 ± 0.011 1.665 1.694 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 1.593 ± 0.009 1.582 1.603 1.00
head math_integration_benchmark 1.596 ± 0.022 1.583 1.656 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 1.208 ± 0.018 1.188 1.242 1.02 ± 0.02
head memory_integration_benchmark 1.184 ± 0.004 1.179 1.188 1.00
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 1.824 ± 0.063 1.796 2.002 1.02 ± 0.04
head operations_with_data_structures_benchmarks 1.784 ± 0.014 1.771 1.814 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base pedersen 519.8 ± 15.2 510.3 560.9 1.01 ± 0.03
head pedersen 514.9 ± 4.1 511.3 523.8 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base poseidon_integration_benchmark 972.9 ± 20.6 961.1 1030.8 1.00 ± 0.02
head poseidon_integration_benchmark 971.5 ± 10.6 964.0 1000.5 1.00
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 1.874 ± 0.038 1.846 1.972 1.02 ± 0.02
head secp_integration_benchmark 1.843 ± 0.012 1.827 1.865 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base set_integration_benchmark 662.8 ± 7.6 657.6 683.4 1.03 ± 0.02
head set_integration_benchmark 645.4 ± 10.9 634.8 661.2 1.00
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 4.109 ± 0.021 4.075 4.146 1.00
head uint256_integration_benchmark 4.120 ± 0.068 4.047 4.240 1.00 ± 0.02