mratsim / constantine

Constantine: modular, high-performance, zero-dependency cryptography stack for verifiable computation, proof systems and blockchain protocols.
Other
413 stars 44 forks source link

EIP-2537 - BLS12-381 precompiles for the EVM #368

Closed mratsim closed 7 months ago

mratsim commented 8 months ago

This PR helps providing pricing feedback for EIP-2537 and also implements it.


Detailed benchmark and metering, constant-time and variable-time (for worst-case scenario) is available in: https://github.com/mratsim/constantine/blob/eip2537/metering/eip2537.md

Low-level benchmark The addition and scalar mul are constant-time when not mentioned vartime hence worst-case scenario.

image

vs Gnark (variable-time)

git clone https://github.com/Consensys/gnark-crypto
cd gnark-crypto/ecc/bls12-381
go test -bench="(Pairing|G[12]Jac(Add|Double|ScalarMultiplication))" --cpu 1 -run=none

image

Operation Constantine speedup over Gnark Constantine vartime
G1 add 1.20x 2.21x
G1 mul 1.01x 1.24x
G2 add 1.14x 2.66x
G2 mul 1.38x 1.63x
Pairing 1.16x N/A
mratsim commented 7 months ago

For @asanso on gas pricing

gas costs in ratio of G1 scalarmul image

original: https://gist.github.com/mratsim/6785a29e72865cfa94e1174fae1e1168 image

Reproduction

git clone https://github.com/mratsim/constantine
cd constantine
git checkout eip2537
CC=clang nimble bench_eip2537_subgroup_checks_impact
mratsim commented 7 months ago

All EIP-2537 precompiles are implemented with benchmarks.

image

--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD                  185.60 MGas/s      371195.249 ops/s         2694 ns/op         8878 CPU cycles (approx)
BLS12_G2ADD                  218.28 MGas/s      272851.296 ops/s         3665 ns/op        12074 CPU cycles (approx)
BLS12_G1MUL                  144.80 MGas/s       12066.947 ops/s        82871 ns/op       273021 CPU cycles (approx)
BLS12_G2MUL                  346.23 MGas/s        7693.965 ops/s       129972 ns/op       428197 CPU cycles (approx)
BLS12_MAP_FP_TO_G1           161.09 MGas/s       29288.580 ops/s        34143 ns/op       112485 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2          702.17 MGas/s        9362.332 ops/s       106811 ns/op       351891 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1       230.31 MGas/s        2132.514 ops/s       468930 ns/op      1544885 CPU cycles (approx)
BLS12_PAIRINGCHECK   2       236.84 MGas/s        1568.453 ops/s       637571 ns/op      2100483 CPU cycles (approx)
BLS12_PAIRINGCHECK   3       237.00 MGas/s        1221.637 ops/s       818574 ns/op      2696796 CPU cycles (approx)
BLS12_PAIRINGCHECK   4       238.23 MGas/s        1005.195 ops/s       994832 ns/op      3277480 CPU cycles (approx)
BLS12_PAIRINGCHECK   5       237.45 MGas/s         848.035 ops/s      1179196 ns/op      3884872 CPU cycles (approx)
BLS12_PAIRINGCHECK   6       201.89 MGas/s         625.056 ops/s      1599857 ns/op      5270568 CPU cycles (approx)
BLS12_PAIRINGCHECK   7       223.64 MGas/s         611.036 ops/s      1636565 ns/op      5391678 CPU cycles (approx)
BLS12_PAIRINGCHECK   8       224.13 MGas/s         548.006 ops/s      1824797 ns/op      6011817 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2              121.74 MGas/s        5712.392 ops/s       175058 ns/op       576717 CPU cycles (approx)
BLS12_G1MSM   4              102.15 MGas/s        3320.152 ops/s       301191 ns/op       992263 CPU cycles (approx)
BLS12_G1MSM   8               81.67 MGas/s        1878.086 ops/s       532457 ns/op      1754181 CPU cycles (approx)
BLS12_G1MSM  16               67.23 MGas/s        1048.407 ops/s       953828 ns/op      3142392 CPU cycles (approx)
BLS12_G1MSM  32               59.01 MGas/s         571.284 ops/s      1750442 ns/op      5766852 CPU cycles (approx)
BLS12_G1MSM  64               51.40 MGas/s         301.480 ops/s      3316972 ns/op     10927814 CPU cycles (approx)
BLS12_G1MSM 128               42.37 MGas/s         158.535 ops/s      6307740 ns/op     20780816 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2              274.23 MGas/s        3431.309 ops/s       291434 ns/op       960108 CPU cycles (approx)
BLS12_G2MSM   4              168.31 MGas/s        1458.762 ops/s       685513 ns/op      2258411 CPU cycles (approx)
BLS12_G2MSM   8              177.98 MGas/s        1091.353 ops/s       916294 ns/op      3018711 CPU cycles (approx)
BLS12_G2MSM  16              150.39 MGas/s         625.367 ops/s      1599062 ns/op      5268087 CPU cycles (approx)
BLS12_G2MSM  32              136.80 MGas/s         353.166 ops/s      2831527 ns/op      9328393 CPU cycles (approx)
BLS12_G2MSM  64              119.28 MGas/s         186.555 ops/s      5360353 ns/op     17659639 CPU cycles (approx)
BLS12_G2MSM 128              101.59 MGas/s         101.367 ops/s      9865099 ns/op     32500502 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
mratsim commented 7 months ago

x86 worst case. Macbook Pro 13" from 2015 with i5-5257U (dual-core mobile Broadwell without ADCX/ADOX instructions and compiled without assemby.

image

--------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD                   79.24 MGas/s      158478.605 ops/s         6310 ns/op        17019 CPU cycles (approx)
BLS12_G2ADD                   86.30 MGas/s      107874.865 ops/s         9270 ns/op        25029 CPU cycles (approx)
BLS12_G1MUL                   52.94 MGas/s        4411.875 ops/s       226661 ns/op       611983 CPU cycles (approx)
BLS12_G2MUL                  113.23 MGas/s        2516.312 ops/s       397407 ns/op      1072999 CPU cycles (approx)
BLS12_MAP_FP_TO_G1            59.91 MGas/s       10892.416 ops/s        91807 ns/op       247878 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2          231.85 MGas/s        3091.372 ops/s       323481 ns/op       873399 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1        79.50 MGas/s         736.127 ops/s      1358461 ns/op      3667743 CPU cycles (approx)
BLS12_PAIRINGCHECK   2        70.75 MGas/s         468.521 ops/s      2134377 ns/op      5762346 CPU cycles (approx)
BLS12_PAIRINGCHECK   3        78.84 MGas/s         406.370 ops/s      2460812 ns/op      6644092 CPU cycles (approx)
BLS12_PAIRINGCHECK   4        77.35 MGas/s         326.389 ops/s      3063828 ns/op      8272202 CPU cycles (approx)
BLS12_PAIRINGCHECK   5        73.62 MGas/s         262.940 ops/s      3803153 ns/op     10268383 CPU cycles (approx)
BLS12_PAIRINGCHECK   6        74.92 MGas/s         231.954 ops/s      4311203 ns/op     11639867 CPU cycles (approx)
BLS12_PAIRINGCHECK   7        74.02 MGas/s         202.239 ops/s      4944633 ns/op     13350277 CPU cycles (approx)
BLS12_PAIRINGCHECK   8        73.98 MGas/s         180.873 ops/s      5528734 ns/op     14927373 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2               39.46 MGas/s        1851.773 ops/s       540023 ns/op      1457370 CPU cycles (approx)
BLS12_G1MSM   4               31.77 MGas/s        1032.714 ops/s       968322 ns/op      2614367 CPU cycles (approx)
BLS12_G1MSM   8               28.10 MGas/s         646.136 ops/s      1547662 ns/op      4178494 CPU cycles (approx)
BLS12_G1MSM  16               22.78 MGas/s         355.252 ops/s      2814907 ns/op      7599937 CPU cycles (approx)
BLS12_G1MSM  32               20.42 MGas/s         197.672 ops/s      5058876 ns/op     13658562 CPU cycles (approx)
BLS12_G1MSM  64               18.18 MGas/s         106.635 ops/s      9377765 ns/op     25319843 CPU cycles (approx)
BLS12_G1MSM 128               15.22 MGas/s          56.936 ops/s     17563428 ns/op     47421045 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2               85.47 MGas/s        1069.452 ops/s       935058 ns/op      2524473 CPU cycles (approx)
BLS12_G2MSM   4               69.02 MGas/s         598.208 ops/s      1671658 ns/op      4513355 CPU cycles (approx)
BLS12_G2MSM   8               55.77 MGas/s         341.953 ops/s      2924378 ns/op      7895732 CPU cycles (approx)
BLS12_G2MSM  16               48.60 MGas/s         202.089 ops/s      4948320 ns/op     13360281 CPU cycles (approx)
BLS12_G2MSM  32               44.21 MGas/s         114.143 ops/s      8760961 ns/op     23654460 CPU cycles (approx)
BLS12_G2MSM  64               39.68 MGas/s          62.057 ops/s     16114098 ns/op     43507952 CPU cycles (approx)
BLS12_G2MSM 128               33.94 MGas/s          33.866 ops/s     29527769 ns/op     79724878 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------
mratsim commented 7 months ago

ARM 64-bit worst case, Raspberry Pi 4, without assembly. And also without any add-with-carry intrinsics, meaning cost is 3 times bigger (main addition, comparison, carry addition) than possible. See also compiler woes https://github.com/mratsim/constantine/issues/357, https://gcc.godbolt.org/z/jdecvffaP.

image

--------------------------------------------------------------------------------------------------------------------                                                 
BLS12_G1ADD                   21.62 MGas/s       43239.504 ops/s        23127 ns/op                                                                                  
BLS12_G2ADD                   21.64 MGas/s       27044.569 ops/s        36976 ns/op                                                                                  
BLS12_G1MUL                   10.53 MGas/s         877.837 ops/s      1139164 ns/op
BLS12_G2MUL                   23.59 MGas/s         524.237 ops/s      1907533 ns/op
BLS12_MAP_FP_TO_G1            11.99 MGas/s        2180.573 ops/s       458595 ns/op
BLS12_MAP_FP2_TO_G2           47.34 MGas/s         631.257 ops/s      1584142 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK   1        16.03 MGas/s         148.469 ops/s      6735393 ns/op
BLS12_PAIRINGCHECK   2        16.02 MGas/s         106.095 ops/s      9425528 ns/op
BLS12_PAIRINGCHECK   3        15.92 MGas/s          82.062 ops/s     12185853 ns/op
BLS12_PAIRINGCHECK   4        15.95 MGas/s          67.282 ops/s     14862870 ns/op
BLS12_PAIRINGCHECK   5        15.89 MGas/s          56.747 ops/s     17621982 ns/op
BLS12_PAIRINGCHECK   6        15.91 MGas/s          49.262 ops/s     20299681 ns/op
BLS12_PAIRINGCHECK   7        15.87 MGas/s          43.363 ops/s     23060922 ns/op
BLS12_PAIRINGCHECK   8        15.89 MGas/s          38.851 ops/s     25739113 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2                8.52 MGas/s         399.889 ops/s      2500694 ns/op
BLS12_G1MSM   4                7.17 MGas/s         233.071 ops/s      4290542 ns/op
BLS12_G1MSM   8                5.75 MGas/s         132.272 ops/s      7560162 ns/op
BLS12_G1MSM  16                4.71 MGas/s          73.425 ops/s     13619290 ns/op
BLS12_G1MSM  32                4.13 MGas/s          39.989 ops/s     25006822 ns/op
BLS12_G1MSM  64                3.60 MGas/s          21.100 ops/s     47394327 ns/op
BLS12_G1MSM 128                2.98 MGas/s          11.134 ops/s     89812356 ns/op
--------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2               18.68 MGas/s         233.735 ops/s      4278343 ns/op
BLS12_G2MSM   4               15.81 MGas/s         137.067 ops/s      7295700 ns/op
BLS12_G2MSM   8               12.30 MGas/s          75.418 ops/s     13259367 ns/op
BLS12_G2MSM  16               10.43 MGas/s          43.388 ops/s     23047767 ns/op
BLS12_G2MSM  32                9.51 MGas/s          24.558 ops/s     40719321 ns/op
BLS12_G2MSM  64                8.22 MGas/s          12.855 ops/s     77791982 ns/op
BLS12_G2MSM 128                7.00 MGas/s           6.984 ops/s    143185138 ns/op
--------------------------------------------------------------------------------------------------------------------