phoreproject / bls

Go implementation of the BLS12-381 pairing
Apache License 2.0
89 stars 31 forks source link

Benchmarks #5

Open JustinDrake opened 5 years ago

JustinDrake commented 5 years ago

Hi there. We're looking to use this library for Ethereum 2.0. Do you have rough performance benchmarks, e.g. for pairings?

meyer9 commented 5 years ago

I'll upload some real benchmarks when I can set it up. Full pairings (miller loop and final exponetiation) is at about 8,000,000 ns/op vs. Zcash at about 3,000,000 ns/op. There's definitely room for improvement still.

meyer9 commented 5 years ago
goos: linux
goarch: amd64
pkg: github.com/phoreproject/bls
BenchmarkBLSAggregateSignature-8      500000          2307 ns/op
BenchmarkBLSSign-8                      1000       1334142 ns/op
BenchmarkBLSVerify-8                     100      17212764 ns/op
BenchmarkFQ12MulAssign-8              100000         16863 ns/op (6950 ns/op)
BenchmarkFQ12SquareAssign-8           100000         12647 ns/op (4900 ns/op)
BenchmarkFQ12InverseAssign-8           20000         60300 ns/op (25000 ns/op)
BenchmarkFQ2AddAssign-8             20000000          84.6 ns/op (35 ns/op)
BenchmarkFQ2SubAssign-8             20000000          75.2 ns/op (32 ns/op)
BenchmarkFQ2MulAssign-8              3000000           562 ns/op (278 ns/op)
BenchmarkFQ2SquareAssign-8           3000000           492 ns/op (220 ns/op)
BenchmarkFQ2InverseAssign-8            50000         29321 ns/op (13610 ns/op)
BenchmarkFQ6AddAssign-8             10000000           245 ns/op
BenchmarkFQ6SubAssign-8             10000000           194 ns/op
BenchmarkFQ6MulAssign-8               300000          4536 ns/op
BenchmarkFQ6SquareAssign-8            300000          3939 ns/op
BenchmarkFQ6InverseAssign-8            50000         37269 ns/op
BenchmarkFQAddAssign-8              50000000          35.8 ns/op (19 ns/op)
BenchmarkFQSubAssign-8              50000000          30.4 ns/op (14 ns/op)
BenchmarkFQMulAssign-8              20000000           117 ns/op (72 ns/op)
BenchmarkFQMul2-8                   50000000          25.4 ns/op
BenchmarkFQSquare-8                 10000000           148 ns/op (66 ns/op)
BenchmarkFQInverse-8                   50000         32540 ns/op (13300 ns/op)
BenchmarkFQNegate-8                 30000000          46.0 ns/op (11 ns/op)
BenchmarkFQSqrt-8                      20000         81789 ns/op (40000 ns/op)
BenchmarkG1MulAssign-8                  2000        656593 ns/op (332700 ns/op)
BenchmarkG1AddAssign-8                500000          2380 ns/op (1357 ns/op)
BenchmarkG1AddAssignMixed-8          1000000          1878 ns/op (1000 ns/op)
BenchmarkG2MulAssign-8                   500       2725396 ns/op (1073000 ns/op)
BenchmarkG2AddAssign-8                100000         10131 ns/op (4504 ns/op)
BenchmarkG2AddAssignMixed-8           200000          7659 ns/op (3184 ns/op)
BenchmarkG2Prepare-8                    3000        575940 ns/op (240000 ns/op)
BenchmarkMillerLoop-8                   1000       1668151 ns/op (677000 ns/op)
BenchmarkFinalExponentiation-8           200       6179057 ns/op (1800000 ns/op)
BenchmarkPairing-8                       200       8413597 ns/op (2700000 ns/op)
BenchmarkMACWithCarry-8             1000000000        2.33 ns/op
BenchmarkSubWithCarry-8             1000000000        2.54 ns/op
BenchmarkAddWithCarry-8             1000000000        2.72 ns/op
PASS
ok      github.com/phoreproject/bls 83.917s

I've also attached Zcash benchmarks for their BLS library.

meyer9 commented 5 years ago

I bet this could be sped up even more if we switched compiling to use gcc-go. The problem with that is that the gc assembly is incompatible with gcc-go.

meyer9 commented 5 years ago

Benchmarks using math/bits:

goos: linux
goarch: amd64
pkg: github.com/phoreproject/bls
BenchmarkBLSAggregateSignature-8      500000          2960 ns/op
BenchmarkBLSSign-8                      1000       1419106 ns/op
BenchmarkBLSVerify-8                     100      20543869 ns/op
BenchmarkFQ12MulAssign-8              100000         20071 ns/op
BenchmarkFQ12SquareAssign-8           100000         14215 ns/op
BenchmarkFQ12InverseAssign-8           20000         65837 ns/op
BenchmarkFQ2AddAssign-8             20000000           100.0 ns/op
BenchmarkFQ2SubAssign-8             20000000            74.9 ns/op
BenchmarkFQ2MulAssign-8              2000000           687 ns/op
BenchmarkFQ2SquareAssign-8           2000000           687 ns/op
BenchmarkFQ2InverseAssign-8            30000         37184 ns/op
BenchmarkFQ6AddAssign-8              5000000           291 ns/op
BenchmarkFQ6SubAssign-8             10000000           250 ns/op
BenchmarkFQ6MulAssign-8               200000          6658 ns/op
BenchmarkFQ6SquareAssign-8            300000          4747 ns/op
BenchmarkFQ6InverseAssign-8            30000         43820 ns/op
BenchmarkFQAddAssign-8              30000000            41.6 ns/op
BenchmarkFQSubAssign-8              50000000            35.4 ns/op
BenchmarkFQMulAssign-8              10000000           149 ns/op
BenchmarkFQMul2-8                   50000000            25.1 ns/op
BenchmarkFQSquare-8                 10000000           120 ns/op
BenchmarkFQInverse-8                   50000         33650 ns/op
BenchmarkFQNegate-8                 30000000            47.6 ns/op
BenchmarkFQSqrt-8                      20000         84710 ns/op
BenchmarkG1MulAssign-8                  2000        756880 ns/op
BenchmarkG1AddAssign-8                500000          2774 ns/op
BenchmarkG1AddAssignMixed-8          1000000          2156 ns/op
BenchmarkG2MulAssign-8                   500       3005618 ns/op
BenchmarkG2AddAssign-8                100000         12370 ns/op
BenchmarkG2AddAssignMixed-8           200000          9254 ns/op
BenchmarkG2Prepare-8                    2000        713139 ns/op
BenchmarkMillerLoop-8                   1000       2074986 ns/op
BenchmarkFinalExponentiation-8           200       7257563 ns/op
BenchmarkPairing-8                       100      10425579 ns/op
BenchmarkMACWithCarry-8             2000000000           0.27 ns/op
BenchmarkSubWithCarry-8             2000000000           0.56 ns/op
BenchmarkAddWithCarry-8             2000000000           0.30 ns/op
PASS
ok      github.com/phoreproject/bls 79.790s
JustinDrake commented 5 years ago

Thanks @meyer9. Do you want to be part of the next BLS12-381 standardisation call? (See drafts and minutes at https://github.com/pairingwg/bls_standard.)

meyer9 commented 5 years ago

Yes. That would be great. My email is julianmeyer2000 [at] google email provider.

kilic commented 5 years ago

There is another prime field implementation in go which also uses avo generator. https://github.com/kilic/fp

Benchmarked on 2.7 ghz i5 machine

384/Addition             8.95 ns/op
384/Multiplication    92.3 ns/op