mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.05k stars 990 forks source link

Investigate Curve25519-dalek for faster bulletproofs #1948

Closed lehnberg closed 5 years ago

lehnberg commented 6 years ago

Overview

Related Gitter chat:

Quentin Le Sceller @quentinlesceller Nov 08 23:29
Not sure if this was shared before https://doc.dalek.rs/bulletproofs/
Curve22519-dalek bulletproof library which is apparently twice as fast as the libsecp256k1 bulletproof

Ignotus Peverell @ignopeverell Nov 08 23:33
don't think it's been shared before although, while I'm not certain it's this implementation, I do remember Andrew coming with a post questioning their benchmark and use of libsecp

Quentin Le Sceller @quentinlesceller Nov 08 23:38
Okay. Technically could we switch from secp256k1 to curve22519-dalek?
For sure a soft/hard fork just just curious if it happens to be way faster

Ignotus Peverell @ignopeverell Nov 08 23:41
we'd need commitment and schnorr sigs implementations on the same curve

Quentin Le Sceller @quentinlesceller Nov 08 23:41
Okay I see

Ignotus Peverell @ignopeverell Nov 08 23:43
looks like they have a schnorr sigs impl in progress already

Ignotus Peverell @ignopeverell 00:08
Ristretto and Dalek are interesting
mcdallas commented 6 years ago

Found this comment by apoelstra regarding this new implementation https://old.reddit.com/r/Bitcoin/comments/8c21mo/we_made_bulletproofs_twice_as_fast_with_rust_and/dxcr2xi/ maybe the one @ignopeverell refers to

yeastplume commented 5 years ago

Probably worth explicitly stating that this is a very non-trivial change, and not something we can dismissively plan to hardfork. Some things to think about:

apoelstra commented 5 years ago

A few unordered thoughts:

It would be insane to produce bulletproofs with research-quality code; or code in a language like Rust with many extremely aggressive compiler optimizations that basically can't be overridden without UB. It is less insane to validate Bulletproofs this way because you don't care about constant-timeness, but even more dangerous, because the risk changes from "will this leak secret data from actively-monitored users" to "will this cause a permanent unexpected chain fork".

In Bitcoin when we moved from OpenSSL to libsecp256k1, we moved signing over many years before we moved verification; and actually we only changed verification to libsecp after we'd discovered that OpenSSL had latent consensus bugs all by itself. Of course, if you are deliberately hardforking and completely changing the curve, backward compatibility isn't a concern, but my point is that verification/consensus logic is scary.

The bulk of the speedup from the Dalek code (~75% IIRC) comes from their use of AVX2, which could be also done in libsecp. The rest comes from a combination of their cheaper curve operations and Rust, but it's hard to tell what's what because you can't just "turn off" either one the way you can with AVX2. Unfortunately I think the majority comes from cheaper curve operations ("unfortunate" because while it's possible to write a BP verifier in Rust, it's not possible to get curve25519-dalek-style additions for secp256k1).

In general, the curve25519-dalek community is very open to exposing basic cryptographic primitives to allow people to "roll their own crypto". This makes sense for code written by cryptographers for cryptographers, but I'd be hesitant to use such code in production because there are many considerations (e.g. sidechannel hardening, anti-footgun API design) which require a production-oriented mindset (plus they're boring, both from a research perspective and a programmer's perspective). Not to mention the temptation to do cryptographically dangerous things in user-level code when your library hards you the sharp tools to do so.

lehnberg commented 5 years ago

Speed update: https://medium.com/@hdevalence/even-faster-edwards-curves-with-ifma-8b1e576a00e9

Intel’s AVX512 instruction set includes an extension, AVX512-IFMA, aimed at accelerating integer arithmetic. I wrote a new curve25519-dalek backend in Rust, using these instructions to implement the parallel formula strategy. This gives another performance gain of about 1.5x relative to my AVX2 implementation, or 2.3x relative to ed25519-donna, and it seems likely that it’s the fastest Curve25519 implementation ever. More interestingly, it almost closes the performance gap (for now) between Curve25519 and the newer and faster FourQ curve from Microsoft Research, even when FourQ uses the (patented) endomorphism speedup

yeastplume commented 5 years ago

I think we can consider this investigated for now