Approval votes by hash inversion vs faster assignment VRFs

burdges commented 1 year ago

Around paritytech/polkadot-sdk#732 we've several partially orthogonal-ish ideas for approval votes: paritytech/polkadot-sdk#607 sounds great. paritytech/polkadot-sdk#701 gives clean security but complicates our existing code. Also https://github.com/paritytech/polkadot-sdk/issues/604 + https://github.com/paritytech/polkadot/pull/7482 which breaks paritytech/polkadot-sdk#635 so maybe a pessimisation long-term, and also likely breaks politeness, aka approvals for never seen assignments become no-shows.

In paritytech/polkadot#6782 we're currently using the DLEQ VRF ability to sign an extra message, but we've enough other load balancing that this could be removed so the VRFs never sign an additional messages. We could then use a much faster VRF given by RSA-FDH with e=3, but this remains quite delicate and imposes key rotation. It's good for post-quantum experiments though.

Assume we do not try RSA-FDA or post-quantum VRFs, so we keep this extra message signing capability of our VRFs. We could then optimize approvals signatures into mere hashes:

In assignments VRFs, we'd sign an extra message containing the relay chain block hash R for which this assignment applies, along with a [H; S] where

S is the number of samples for this assignment, so S=1 for tranches > 0 and S=relayVrfModuleSamples for tranche 0, and
each H is the 32 byte hash of a fresh random number X_A pulled form /dev/urandom specifically for this assignment A.

We now vote approve for candidate A in R by simply revealing X_A. In essence, we pre-signed our approval in the assignment, but we only complete the signature in the approval message.

If there is a relay chain equivocation R' for R then we do not send a separate assignment for R' but instead everyone counts our assignments for R as also assignments for R' and if the core of A in R has a different A' in R' then everyone assumes we're assigned there too and we must sign that approval with sr25519.

We need additional scheckers for relay chain equivocations too, which we've not yet implemented, but must do so. We always slash enough for relay chain equivocations to pay for all this.

Good: As fast as not checking signatures, but permits paritytech/polkadot-sdk#635 and gossip, without breaking politeness.

Bad: It breaks faster and post-quantum VRFs, so ideally we'd maintain a cargo feature which works without this optimization. It's kinda deeply merging crypto-ish signature logic into the approvals pipeline. It's a dangerous fast v slow code fork which complicates security, maintenance, auditing, etc. If nodes go down and come back quickly then they now no-show, even if they could catch up fast enough previously.

burdges commented 1 year ago

As a comparison, if we'd a correct implementation of RSA-FDH with e=3, and if keys rotation can make 2048 bit keys secure, then we'd likely see a 10x speed up for both assignments and approvals by simply dropping it in.

alexggh commented 1 year ago

As a comparison, if we'd a correct implementation of RSA-FDH with e=3, and if keys rotation can make 2048 bit keys secure, then we'd likely see a 10x speed up for both assignments and approvals by simply dropping it in.

Any historical reasons why we haven't try to use RSA-FDH, beside he key rotations requirements, which I think would add a big maintenance burden to people running validators.

burdges commented 1 year ago

We'd automate the key rotation, with the grandpa key certifying those keys. We should've done this with more keys anyways, which maybe a good thing to as part of https://github.com/w3f/research-internal/issues/530

As mentioned in https://github.com/w3f/research-internal/issues/384 there exists a good Rabin-Williams implementation in C, even faster than RSA with e=3 but not a VRF, but we never produced a good implementation in Rust. Rabin-Williams and RSA with e=3 are both kinda fragile RSA variants, so you do not deploy them lightly.

Rabin-Williams with 2048 bits keys, plus all optimizations, winds up 20x faster than ed25519, but it's less impressive once you raise the key size. 2048 bits RSA keys are no longer widely recommended.

As a rule, 1024 bit RSA keys are now considered insecure, because they're only about 1000x stronger than 768 bit RSA keys. In 2010, a team factored a 768 bit RSA key after about 2.5 years of work on hundreds of machines, although 1.5 years likely sufficed in their setup. And hardware improved since 2010! etc.

I'd expect a 2048 bit RSA key still remains secure today, but automated daily key rotation buys considerable insurance, but conversely these being fragile variants leaves zero room for implementation mistakes. Also, these signatures are 256-300 bytes vs 64 for sr/ed25519, so if bandwidth matters more than we realize then they yield less benefits.

It's all bad optics in the blockchain space too, even if it remains quite secure. If we rotate keys so much then other fast post-quantum signatures become possible too, but they require far more bandwidth but if the bandwidth matters less then maybe that's fine. Also we hire for zk proofs on the crypto research team, not really for classical primitives like RSA, so we'd be outsourcing some of this.

We've always played it conservative on crypto choices. sr25519 was not exception because it's more conservative than ed25519 in most respects.

tl;dr We'd uncertainties, so we do other things first. :)

eskimor commented 1 year ago

Also paritytech/polkadot-sdk#604 + paritytech/polkadot#7482 which breaks paritytech/polkadot-sdk#635

Actually there is a longer term plan so paritytech/polkadot-sdk#635 would still be possible. It requires approval checkers incentivization as in the tit for tat scheme you designed. Then the idea is that everybody checks lazily a random sample of the signatures in the session, if an invalid signature is found - the node will invalidate the approval voting score for that offending node, so no/less rewards.

This should work, because the whole checking is just to avoid lazy approval checkers. Therefore, if there is incentivization to actually sign and and incentivization to do the validation (potential slash), we should be good.

burdges commented 1 year ago

@AlistairStewart believes paritytech/polkadot-sdk#635 adds security, even if we only slash the approval checkers 50% / k. In fact, we'd only slash a token amount if it were only for laziness. We thus not only worried about lazy checkers, so incentives cannot help.

AlistairStewart commented 1 year ago

When you have multiple assignments for tranche 0, it would still make sense to do this even if you were using RSA or the like. A batch announcement would contain a VRF proof and a signature of the hashes. So you verify one signature instead of one for every assigment.

burdges commented 1 year ago

There are good benchmarks in https://bench.cr.yp.to/results-sign.html#amd64-skylake for fast C implementations, via https://crypto.stackexchange.com/a/35600

Rabin-Williams with 1024 bit keys (rwb0fuz1024) is 24.7 x faster than ed25519, but that's really small key sizes, but maybe with fast enough key rotation, and the code could be adapted to larger key sizes. I'd guesstimate rwb0fuz2048 would be 12 x faster than ed25519, but not sure.

In fact RSA with e=3 is commonplace, even in openssl, and appears here under the name ronald[keysize], with ronald1536 and ronald2048 being 5.1 and 3.7 x faster than ed25519, respectively. I'd expect sr25519's VRF to be 2.5-ish x slower than ed25519, so this becomes 12 and 9 x, respectively. Also, e=3 is discussed in https://security.stackexchange.com/questions/2335/should-rsa-public-exponent-be-only-in-3-5-17-257-or-65537-due-to-security-c/2339#2339

In brief, RW and RSA-FDH could make both approvals signatures and assignments VRFs like 12 x faster, respectively, but they do increase message sizes somewhat. In this, there are no real logic changes, except that keys should automatically be rotated and certified by nodes, plus some C dependencies.

paritytech / polkadot-sdk

Approval votes by hash inversion vs faster assignment VRFs #593