Coordinator handling of the sign protocol is flawed

We really need to change how we handle preprocess accumulation.

Currently, we select the first t people who respond to attempt a sign protocol. Then, the selected are supposed to send the signature shares. After a certain amount of time, a re-attempt is triggered. This re-attempt is participated in by those who don't believe a signature occurred. If at least 2/3rds of people participate, another re-attempt is scheduled (as presumably, the original attempt failed, this attempt is being attempted, and if it fails, we'll need another attempt).

There's multiple issues with this.

1) Theoretically, all n people sent preprocesses for the first attempt. We don't need to redo the entire sign protocol. We have n-t preprocesses still usable and only need a further t-(n-t) preprocesses. 2) We don't temporarily jail people who go offline mid-protocol. 3) If someone submits an invalid signature, we have t-1 people detect the invalid signature. That is insufficient to cause a removal, or even a fatal slash. 4) We detect if a signature was produced by being one of the producers or by seeing the signed transaction on the blockchain. We don't want to send the transaction around our own network, as they're large and we'd have to verify the claimed transaction follows all consensus rules on its proofs (being accepted by the node handles the latter for us). We could have everyone locally accumulate the shares to produce the final signatures however (forming the transaction that way with no additional bandwidth, removing the entire claimed completion protocol).

1 is ideally resolved largely via changes on the coordinator side of things. The processor will need to be updated with consideration for how the preprocesses it made for attempt #n are now used for attempt #n+1.

2 can be resolved at any time via the coordinator alone and is an open TODO.

3 and 4 requires new FROST code which is a pain. I don't have a better solution to propose at this time.

We have acceptable solutions for all of this at the time. 1 is solely a performance comment, 2 at worst is an DoS which will still cause slashes to be accumulated, 3 shouldn't occur in practice (as even without any slashing, it's just another DoS vector), and 4 is solved by the fact we auto-retry and allow announcing completions. I don't love saying DoSs are acceptable, yet considering they're pointless and don't cause a loss of funds, they're acceptable.

As one further note, Serai represents a fundamentally distinct use-case from wallet self-custody solutions. I say the following with that as necessary context.

Non-robust non-PV protocols are annoying as hell to manage, and I'm unsure the protocols necessary to resolve their flaws are worthwhile at this scale. If I could, I believe I'd 100x my costs and adopt a linear-complexity robust scheme rather than use FROST in this setting. Doing so would remove the need for an intelligent re-attempt protocol (ROAST, Serai's own reattempt protocol) and remove the need to preserve randomness between rounds (with all the secure handling bounds associated). I'd prefer the cryptographic complexity to the application-layer complexity. I just can't justify the bandwidth (#453) at this time, nor the security (I never did the security proofs for my proposed protocol premised on class groups).

I'll also note the idea of a teVRF premised on LWE as a way to achieve robust linear-complexity signing protocols, yet there's not an efficient proof for that posited (and suggested parameter sets are largely unclear).

serai-dex / serai

Coordinator handling of the sign protocol is flawed #588