status-im / nim-blscurve

Nim implementation of BLS signature scheme (Boneh-Lynn-Shacham) over Barreto-Lynn-Scott (BLS) curve BLS12-381
Apache License 2.0
26 stars 11 forks source link

Benchmarks #46

Closed mratsim closed 4 years ago

mratsim commented 4 years ago

Addresses https://github.com/status-im/nim-beacon-chain/issues/871

Pending,

Example report (Note that CPU is overclocked at 4.1 GHz so the cycles are off by about 4.1GHz/3.0GHz)

$  nim c -d:danger --verbosity:0 --hints:off --warnings:off --outdir:build -r benchmarks/bench_all.nim 
Warmup: 0.9025 s, result 224 (displayed to avoid compiler optimizing warmup away)

⚠️ Cycles measurements are approximate and use the CPU nominal clock: Turbo-Boost and overclocking will skew them.
==========================================================================================================

Compiled with GCC
Optimization level => no optimization: false | release: true | danger: true
Using Milagro with 64-bit limbs
Running on Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz

Scalar multiplication G1                                                2412.667 ops/s        414479 ns/op       1243455 cycles
Scalar multiplication G2                                                 889.198 ops/s       1124609 ns/op       3373869 cycles
EC add G1                                                             912408.759 ops/s          1096 ns/op          3290 cycles
EC add G2                                                             323206.206 ops/s          3094 ns/op          9284 cycles
Pairing (Milagro builtin double pairing)                                 415.093 ops/s       2409099 ns/op       7227390 cycles
Pairing (Multi-Pairing with delayed Miller and Exp)                      417.191 ops/s       2396982 ns/op       7191039 cycles

⚠️ Warning: using draft v5 of IETF Hash-To-Curve (HKDF-based).
           This is an outdated draft.

Hash to G2 (Draft #5)                                                    941.698 ops/s       1061912 ns/op       3185779 cycles
mratsim commented 4 years ago

Cycle counting on ARM seems tricky:

Otherwise Google or FFTW approach might work but might require perf_counter privilege (kernel.perf_event_paranoid=0 ?)