status-im / nim-blscurve

Nim implementation of BLS signature scheme (Boneh-Lynn-Shacham) over Barreto-Lynn-Scott (BLS) curve BLS12-381
Apache License 2.0
26 stars 11 forks source link

Public key aggregation is slow #13

Closed mratsim closed 5 years ago

mratsim commented 5 years ago

On my i5-5257U (dual core mobile Broadwell 2.7Ghz Turbo 3.1 from 2015), I get the following stats for signature aggregation:

Warmup: 1.1908 s, result 224 (displayed to avoid compiler optimizing warmup away)

#### Block parameters
Number of validators:                                                                       482
Number of block parent hashes:                                                               12
Fork version:                                                                                 3
Slot:                                                                                      4246
Shard_id:                                                                                   555
Parent_hash[0]:                99D2587E07003CFE8023D46401577191EF89BFCC239A6EF1922AC49A687116A2
Shard_block_hash:              0CF579DC04024D8D4292A4BBCFCAD24F6A20C44AF665A7A4144CE84E8821E77A
justified_slot:                                                                            1846

#### Message, crypto keys and signatures
482 secret and public keys pairs generated in 2.014 s
Throughput: 239.279 kps/s (key pairs/second)

Message generated in 0.010 ms

482 public key and message signature pairs generated in 1.153 s
Throughput: 418.150 kps/s (keysig pairs/second)

#### Benchmark: signature aggregation

Benchmarking signature aggregation
Collected 100 samples in 153.974 seconds
Average time: 1539.735 ms
Stddev  time: 3.821 ms
Min     time: 1536.821 ms
Max     time: 1558.711 ms

Display computation result to make sure it's not optimized away
0418ff7d1d14353af2f95bb25724fa9787cd4e95c4b5040dbddf1ff3a601c29943974ad5cf806c89b04fda4564c513d2ae1420cecdeaaa0bd4888a5b066efafa2222425216e8e8a43982735c68ddf37ef0494cfc1830e8be270bd5d026804f19f8

But uncommenting the public key aggregation benchmark will leave the bench stuck, not even 10 samples can be benchmarked in 2 min:

image

If we dive into the detail of ECP2_BLS381_mul, FP2_BLS381_mul is a huge bottleneck: 2018-10-27_18-58-45

This is due to BIG_384_29_mul and BIG_384_29_monty (Montgomery reduction?)

image

mratsim commented 5 years ago

Might be due to using 32-bit (#5), but 32-bit being fast enough for embedded is required.

mratsim commented 5 years ago

CLosed by https://github.com/status-im/nim-beacon-chain/commit/b9b9e0ebfbd795b6d424f02df1216cac0831bf1b, see https://github.com/status-im/nim-beacon-chain/issues/8