Closed mratsim closed 7 months ago
For @asanso on gas pricing
gas costs in ratio of G1 scalarmul
original: https://gist.github.com/mratsim/6785a29e72865cfa94e1174fae1e1168
Reproduction
git clone https://github.com/mratsim/constantine
cd constantine
git checkout eip2537
CC=clang nimble bench_eip2537_subgroup_checks_impact
All EIP-2537 precompiles are implemented with benchmarks.
--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD 185.60 MGas/s 371195.249 ops/s 2694 ns/op 8878 CPU cycles (approx)
BLS12_G2ADD 218.28 MGas/s 272851.296 ops/s 3665 ns/op 12074 CPU cycles (approx)
BLS12_G1MUL 144.80 MGas/s 12066.947 ops/s 82871 ns/op 273021 CPU cycles (approx)
BLS12_G2MUL 346.23 MGas/s 7693.965 ops/s 129972 ns/op 428197 CPU cycles (approx)
BLS12_MAP_FP_TO_G1 161.09 MGas/s 29288.580 ops/s 34143 ns/op 112485 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2 702.17 MGas/s 9362.332 ops/s 106811 ns/op 351891 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK 1 230.31 MGas/s 2132.514 ops/s 468930 ns/op 1544885 CPU cycles (approx)
BLS12_PAIRINGCHECK 2 236.84 MGas/s 1568.453 ops/s 637571 ns/op 2100483 CPU cycles (approx)
BLS12_PAIRINGCHECK 3 237.00 MGas/s 1221.637 ops/s 818574 ns/op 2696796 CPU cycles (approx)
BLS12_PAIRINGCHECK 4 238.23 MGas/s 1005.195 ops/s 994832 ns/op 3277480 CPU cycles (approx)
BLS12_PAIRINGCHECK 5 237.45 MGas/s 848.035 ops/s 1179196 ns/op 3884872 CPU cycles (approx)
BLS12_PAIRINGCHECK 6 201.89 MGas/s 625.056 ops/s 1599857 ns/op 5270568 CPU cycles (approx)
BLS12_PAIRINGCHECK 7 223.64 MGas/s 611.036 ops/s 1636565 ns/op 5391678 CPU cycles (approx)
BLS12_PAIRINGCHECK 8 224.13 MGas/s 548.006 ops/s 1824797 ns/op 6011817 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM 2 121.74 MGas/s 5712.392 ops/s 175058 ns/op 576717 CPU cycles (approx)
BLS12_G1MSM 4 102.15 MGas/s 3320.152 ops/s 301191 ns/op 992263 CPU cycles (approx)
BLS12_G1MSM 8 81.67 MGas/s 1878.086 ops/s 532457 ns/op 1754181 CPU cycles (approx)
BLS12_G1MSM 16 67.23 MGas/s 1048.407 ops/s 953828 ns/op 3142392 CPU cycles (approx)
BLS12_G1MSM 32 59.01 MGas/s 571.284 ops/s 1750442 ns/op 5766852 CPU cycles (approx)
BLS12_G1MSM 64 51.40 MGas/s 301.480 ops/s 3316972 ns/op 10927814 CPU cycles (approx)
BLS12_G1MSM 128 42.37 MGas/s 158.535 ops/s 6307740 ns/op 20780816 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM 2 274.23 MGas/s 3431.309 ops/s 291434 ns/op 960108 CPU cycles (approx)
BLS12_G2MSM 4 168.31 MGas/s 1458.762 ops/s 685513 ns/op 2258411 CPU cycles (approx)
BLS12_G2MSM 8 177.98 MGas/s 1091.353 ops/s 916294 ns/op 3018711 CPU cycles (approx)
BLS12_G2MSM 16 150.39 MGas/s 625.367 ops/s 1599062 ns/op 5268087 CPU cycles (approx)
BLS12_G2MSM 32 136.80 MGas/s 353.166 ops/s 2831527 ns/op 9328393 CPU cycles (approx)
BLS12_G2MSM 64 119.28 MGas/s 186.555 ops/s 5360353 ns/op 17659639 CPU cycles (approx)
BLS12_G2MSM 128 101.59 MGas/s 101.367 ops/s 9865099 ns/op 32500502 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
x86 worst case. Macbook Pro 13" from 2015 with i5-5257U (dual-core mobile Broadwell without ADCX/ADOX instructions and compiled without assemby.
--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD 79.24 MGas/s 158478.605 ops/s 6310 ns/op 17019 CPU cycles (approx)
BLS12_G2ADD 86.30 MGas/s 107874.865 ops/s 9270 ns/op 25029 CPU cycles (approx)
BLS12_G1MUL 52.94 MGas/s 4411.875 ops/s 226661 ns/op 611983 CPU cycles (approx)
BLS12_G2MUL 113.23 MGas/s 2516.312 ops/s 397407 ns/op 1072999 CPU cycles (approx)
BLS12_MAP_FP_TO_G1 59.91 MGas/s 10892.416 ops/s 91807 ns/op 247878 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2 231.85 MGas/s 3091.372 ops/s 323481 ns/op 873399 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK 1 79.50 MGas/s 736.127 ops/s 1358461 ns/op 3667743 CPU cycles (approx)
BLS12_PAIRINGCHECK 2 70.75 MGas/s 468.521 ops/s 2134377 ns/op 5762346 CPU cycles (approx)
BLS12_PAIRINGCHECK 3 78.84 MGas/s 406.370 ops/s 2460812 ns/op 6644092 CPU cycles (approx)
BLS12_PAIRINGCHECK 4 77.35 MGas/s 326.389 ops/s 3063828 ns/op 8272202 CPU cycles (approx)
BLS12_PAIRINGCHECK 5 73.62 MGas/s 262.940 ops/s 3803153 ns/op 10268383 CPU cycles (approx)
BLS12_PAIRINGCHECK 6 74.92 MGas/s 231.954 ops/s 4311203 ns/op 11639867 CPU cycles (approx)
BLS12_PAIRINGCHECK 7 74.02 MGas/s 202.239 ops/s 4944633 ns/op 13350277 CPU cycles (approx)
BLS12_PAIRINGCHECK 8 73.98 MGas/s 180.873 ops/s 5528734 ns/op 14927373 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM 2 39.46 MGas/s 1851.773 ops/s 540023 ns/op 1457370 CPU cycles (approx)
BLS12_G1MSM 4 31.77 MGas/s 1032.714 ops/s 968322 ns/op 2614367 CPU cycles (approx)
BLS12_G1MSM 8 28.10 MGas/s 646.136 ops/s 1547662 ns/op 4178494 CPU cycles (approx)
BLS12_G1MSM 16 22.78 MGas/s 355.252 ops/s 2814907 ns/op 7599937 CPU cycles (approx)
BLS12_G1MSM 32 20.42 MGas/s 197.672 ops/s 5058876 ns/op 13658562 CPU cycles (approx)
BLS12_G1MSM 64 18.18 MGas/s 106.635 ops/s 9377765 ns/op 25319843 CPU cycles (approx)
BLS12_G1MSM 128 15.22 MGas/s 56.936 ops/s 17563428 ns/op 47421045 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM 2 85.47 MGas/s 1069.452 ops/s 935058 ns/op 2524473 CPU cycles (approx)
BLS12_G2MSM 4 69.02 MGas/s 598.208 ops/s 1671658 ns/op 4513355 CPU cycles (approx)
BLS12_G2MSM 8 55.77 MGas/s 341.953 ops/s 2924378 ns/op 7895732 CPU cycles (approx)
BLS12_G2MSM 16 48.60 MGas/s 202.089 ops/s 4948320 ns/op 13360281 CPU cycles (approx)
BLS12_G2MSM 32 44.21 MGas/s 114.143 ops/s 8760961 ns/op 23654460 CPU cycles (approx)
BLS12_G2MSM 64 39.68 MGas/s 62.057 ops/s 16114098 ns/op 43507952 CPU cycles (approx)
BLS12_G2MSM 128 33.94 MGas/s 33.866 ops/s 29527769 ns/op 79724878 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
ARM 64-bit worst case, Raspberry Pi 4, without assembly. And also without any add-with-carry intrinsics, meaning cost is 3 times bigger (main addition, comparison, carry addition) than possible. See also compiler woes https://github.com/mratsim/constantine/issues/357, https://gcc.godbolt.org/z/jdecvffaP.
--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD 21.62 MGas/s 43239.504 ops/s 23127 ns/op
BLS12_G2ADD 21.64 MGas/s 27044.569 ops/s 36976 ns/op
BLS12_G1MUL 10.53 MGas/s 877.837 ops/s 1139164 ns/op
BLS12_G2MUL 23.59 MGas/s 524.237 ops/s 1907533 ns/op
BLS12_MAP_FP_TO_G1 11.99 MGas/s 2180.573 ops/s 458595 ns/op
BLS12_MAP_FP2_TO_G2 47.34 MGas/s 631.257 ops/s 1584142 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK 1 16.03 MGas/s 148.469 ops/s 6735393 ns/op
BLS12_PAIRINGCHECK 2 16.02 MGas/s 106.095 ops/s 9425528 ns/op
BLS12_PAIRINGCHECK 3 15.92 MGas/s 82.062 ops/s 12185853 ns/op
BLS12_PAIRINGCHECK 4 15.95 MGas/s 67.282 ops/s 14862870 ns/op
BLS12_PAIRINGCHECK 5 15.89 MGas/s 56.747 ops/s 17621982 ns/op
BLS12_PAIRINGCHECK 6 15.91 MGas/s 49.262 ops/s 20299681 ns/op
BLS12_PAIRINGCHECK 7 15.87 MGas/s 43.363 ops/s 23060922 ns/op
BLS12_PAIRINGCHECK 8 15.89 MGas/s 38.851 ops/s 25739113 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM 2 8.52 MGas/s 399.889 ops/s 2500694 ns/op
BLS12_G1MSM 4 7.17 MGas/s 233.071 ops/s 4290542 ns/op
BLS12_G1MSM 8 5.75 MGas/s 132.272 ops/s 7560162 ns/op
BLS12_G1MSM 16 4.71 MGas/s 73.425 ops/s 13619290 ns/op
BLS12_G1MSM 32 4.13 MGas/s 39.989 ops/s 25006822 ns/op
BLS12_G1MSM 64 3.60 MGas/s 21.100 ops/s 47394327 ns/op
BLS12_G1MSM 128 2.98 MGas/s 11.134 ops/s 89812356 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM 2 18.68 MGas/s 233.735 ops/s 4278343 ns/op
BLS12_G2MSM 4 15.81 MGas/s 137.067 ops/s 7295700 ns/op
BLS12_G2MSM 8 12.30 MGas/s 75.418 ops/s 13259367 ns/op
BLS12_G2MSM 16 10.43 MGas/s 43.388 ops/s 23047767 ns/op
BLS12_G2MSM 32 9.51 MGas/s 24.558 ops/s 40719321 ns/op
BLS12_G2MSM 64 8.22 MGas/s 12.855 ops/s 77791982 ns/op
BLS12_G2MSM 128 7.00 MGas/s 6.984 ops/s 143185138 ns/op
--------------------------------------------------------------------------------------------------------------------
This PR helps providing pricing feedback for EIP-2537 and also implements it.
Detailed benchmark and metering, constant-time and variable-time (for worst-case scenario) is available in: https://github.com/mratsim/constantine/blob/eip2537/metering/eip2537.md
Low-level benchmark The addition and scalar mul are constant-time when not mentioned vartime hence worst-case scenario.
vs Gnark (variable-time)