🔷 [Epic] Multichain - Githubissues

Description

Multichain is a total overhaul of mpc-recovery service that will move us away from aggregated signatures towards TSS. Multichain network will store users' private keys in a non-custodial way by splitting keys into key shares and distributing the pieces among multiple independent parties.

Let P be the number of trusted parties holding the key shares.

Reasons for transition are:

Fault tolerance. Current implementation requires all nodes to be online and ready to accept requests. If any party loses their keys then the whole network cannot operate anymore. Multichain, on the other hand, has a configurable (and dynamic) threshold of participants to be available in order to continue the network (currently presumed to be ~2/3 of P).
Decentralization. We are getting rid of the leader node and introducing a proper consensus mechanism.
Generalization. Multichain is all about being able to sign arbitrary payload on-chain, which enables other potential applications of multichain outside of FastAuth. This means we are decoupling from OIDC and FastAuth will become an application built on top of multichain.

Roughly, the new flow is going to look like this:

Developer D writes and deploys a smart contract C @ my_cool_app.near
User U calls function foo on C
foo makes a cross-contract call to multichain.near (a contract deployed by us) where they request to sign(payload), where payload is payload provided by C's internal logic
P participants index ingoing call to multichain.near/sign and see the transaction above
They follow the MPC cryptographic protocol and generate a signature S for the submitted payload
One of the participants submits S to multichain.near/response
D has an indexer that watches for multichain.near/response and once it sees the transaction above it registers S as the response to U's interaction with C

Resources

Overview of cryptography behind multichain: https://docs.google.com/document/d/1FKC9LvVyrEq6CiFYCnUFtQfvaDWddxzdtaEQb-_fq_s/edit#heading=h.we4ish11290u

This epic presumes that future work is happening on top of https://github.com/near/mpc-recovery/pull/313

### Cryptography
- [ ] https://github.com/near/mpc-recovery/issues/328
- [ ] https://github.com/near/mpc-recovery/issues/341
- [x] https://github.com/near/mpc-recovery/issues/385
- [ ] https://github.com/near/mpc-recovery/issues/384
- [ ] Ensure we can have multiple signing state machines progressing at the same time
- [ ] https://github.com/near/mpc-recovery/issues/456
- [ ] (Optional). Reshare and run at the same time to ensure the liveliness of the protocol (i.e. there is no need to abort ongoing sign requests)
- [ ] https://github.com/near/mpc-recovery/issues/352
- [ ] Occasionally reshare the key even when the participant set does not change. Should be done once in 24 hours - 1 week according to Michel (Security, not necessary for March release IMO)
- [ ] https://github.com/near/mpc-recovery/issues/386
- [x] Persistent Beaver triples and presignatures. Right now on restart a node will lose all of the triples/presignatures rendering them useless for other nodes. (it can complicate the protocl, introdcuse many edge cases)
- [ ] (Optional). Some mechanism on deciding who is messing with the protocol messages. It is impossible to tell whose message broke the protocol step in cait-sith, but in theory some karma system might help here (-1 for being a part of a set that failed to complete a protocol step). Note that there is no incentive to behave badly intentionally, so this is arguably very optional.
- [ ] https://github.com/near/mpc-recovery/issues/439

### Network
- [ ] https://github.com/near/mpc-recovery/issues/329
- [ ] https://github.com/near/mpc-recovery/issues/381
- [ ] https://github.com/near/mpc-recovery/issues/382
- [ ] https://github.com/near/mpc-recovery/issues/445
- [ ] https://github.com/near/mpc-recovery/issues/405
- [ ] Proper message queue for messages from various epochs and states. Might even make it persistent? Important that it does not leak memory
- [ ] Versionized networking, use protobuf for keeping tack of compatibility, make sure that each two adjacent versions are backwards-compatible

### Consensus
- [ ] https://github.com/near/mpc-recovery/issues/330
- [ ] https://github.com/near/mpc-recovery/issues/353
- [ ] Enforce minimum and maximum threshold (and hence the size of the set as it is 150% of the threshold presumably). Minimum is needed to prevent a small set overtaking the entire MPC. Maximum is needed to prevent performance issues.
- [ ] (Optional). Consider a backup network (suggested by Michel, no elaboration)
- [ ] (Optional). Decide if we need some sort of slashing mechanism for misbehaving nodes
- [ ] https://github.com/near/mpc-recovery/issues/389
- [ ] https://github.com/near/mpc-recovery/issues/424
- [ ] (Opt). Recognize that someone else has gave up on the protocol and restarted it. This probably makes sense for only specific types of protocols - generating, resharing (maybe something else?). Major issue here is that there might not be an easy way to distinguish new protocol messages from the old protocol messages. This means some one of the nodes might get confused that the protocol is still alive due to race conditions. One way to battle is to attach random GenerationId and ReshareId to all protocols.
- [ ] (Optional). Allow rolling back from `Resharing` to `Running` if something fatal happened during the resharing phase (e.g. one of the nodes is not participating and blocking us from making progress). Then the set of new joiners is wiped and everyone new have to sign up from scratch.

### API (no yield/resume option)
- [ ] https://github.com/near/mpc-recovery/issues/346
- [ ] (Optional). Migrate to standalone independent indexer for each multichain node. See #346 for
- [ ] Finish with Bowen's self-call improvement (see https://github.com/near/mpc-recovery/pull/401). It is a direct improvement over what we have right now. You are not forced to pay gas for self calls, you can always just index respond txs as it was with the current approach. But with it you get an option to pay up to 300TGas to potentially get a sequential response. This also simplifies integration tests as you don't have to run indexer there.

### API (yield/resume option)
- [ ] Wait until https://github.com/near/NEPs/pull/519 is done
- [ ] Rewrite MPC contract to use yield/resume

### Infra
- [ ] https://github.com/near/mpc-recovery/issues/383
- [ ] https://github.com/near/mpc-recovery/issues/426
- [ ] https://github.com/near/mpc-recovery/issues/427
- [ ] https://github.com/near/mpc-recovery/issues/435
- [ ] https://github.com/near/mpc-recovery/issues/434
- [x] https://github.com/near/mpc-recovery/issues/428
- [ ] https://github.com/near/mpc-recovery/issues/430
- [ ] Enforce a tracing style that we should use uniformly, we can base it off of [this](https://github.com/near/nearcore/blob/master/docs/practices/style.md#tracing)
- [ ] https://github.com/near/mpc-recovery/issues/327
- [ ] Add the ability to start local test env (simmilar to the one we had in old design)
- [ ] Make dev env index starting from a recent block height
- [ ] Keep track of the last block height processed by the indexer
- [ ] https://github.com/near/mpc-recovery/issues/425
- [ ] https://github.com/near/mpc-recovery/issues/419
- [ ] https://github.com/near/mpc-recovery/issues/420
- [ ] https://github.com/near/mpc-recovery/issues/431

### Low priority bugs/performance improvements
- [ ] https://github.com/near/mpc-recovery/issues/433

near / mpc

🔷 [Epic] Multichain #326

Description

Resources