Closed ailisp closed 3 months ago
Yeap, the test code does not look suspicious to me.
To clarify, the test is 100% functional if we are removing nth(2)
and always fails when we remove nth(0)
or nth(1)
?
@volovyks yes
Figure out it's a protocol bug in state.progress from Resharing to WaitingForConsensus.
2024-07-12T09:11:25.673087Z INFO ThreadId(14) mpc_recovery_node::protocol: protocol unable to progress: CaitSithProtocolError(AssertionFailed("new public key does not match old public key"))
In further, this error comes from (cait-sith) protocol.poke(), this step: https://github.com/LIT-Protocol/cait-sith/blob/3066e9ab165783dc28cb39750105a28bfbb3985e/src/keyshare.rs#L170, will dive into it
Cait-sith is good, the problem is we pass the wrong participants set to cait-sith. In the case of participants 0, 1, 2, we have contract state old participants:
[
{id: 0, pk: pk0, ...}
{id: 1, pk: pk1, ...}
{id: 2, pk: pk2, ...}
]
we remove p0, new participant in contract state becomes:
[
{id: 0, pk: pk1, ...}
{id: 1, pk: pk2, ...}
]
Passing to cait-sith, this reduced to simply [u32(0), u32(1)]
, it should be [1,2]
Great finding @ailisp! It is vital to fix.
Still wip for a fix. Have managed to make new participant in protocol reserve the correct participant id from old participants, and make protocol advance a little into WaitingForConsensus state, and remove_participant(node0) returns ok. But got mismatched participant error that prevent protocol proceed further, and the test still fails. Will continue debug my fix tomorrow
Description
In a test setup of three nodes, threshold of 2, vote from two nodes to kick the third node. If the third node is not the last node, the protocol contract state stucks at Resharing. This may indicate under some corner case of X nodes and Y threshold, resharing does not work properly.
Repro snippet (in a with_multichain_nodes):