tangle-network / tangle

Modular restaking infrastructure for developers, restakers, and operators.
https://www.tangle.tools/
GNU General Public License v3.0
51 stars 24 forks source link

[BUG] InvalidKey Error #119

Closed 1xstj closed 1 year ago

1xstj commented 1 year ago

Bug: After a few keygen/signing sessions we start to see this error on all collators

Async Proto Errored: GenericError { reason: “Proceed(ProceedRound(Round2VerifyCommitments(ErrorType { error_type: \“invalid key\“, bad_actors: [0], data: [] })))” }

How to reproduce:

  1. Clone and switch to tangle-error-debug branch
  2. cargo b -rp tangle-parachain
  3. Ensure to clone and build polkadot (make sure update your local polkadot path in dkg-launch.json file)
  4. Run polkadot-launch ./scripts/polkadot-launch/dkg-launch.json to start a parachain network

The session time is set to 3min, wait for a couple session rotations and the proposal signing fails with the above error

drewstone commented 1 year ago

The error is called here: https://github.com/ZenGo-X/multi-party-ecdsa/blob/184c49fa69dcc9568f425884161b9f1e51a3f097/src/protocols/multi_party_ecdsa/gg_2020/state_machine/keygen/rounds.rs#L120-L127.

This is triggered from this function: https://github.com/ZenGo-X/multi-party-ecdsa/blob/184c49fa69dcc9568f425884161b9f1e51a3f097/src/protocols/multi_party_ecdsa/gg_2020/party_i.rs#L260

Which is due to failing the following check:

    let dlog_statement_base_h2 = DLogStatement {
        N: bc1_vec[i].dlog_statement.N.clone(),
        g: bc1_vec[i].dlog_statement.ni.clone(),
        ni: bc1_vec[i].dlog_statement.g.clone(),
    };
    let test_res =
        HashCommitment::<Sha256>::create_commitment_with_user_defined_randomness(
            &BigInt::from_bytes(&decom_vec[i].y_i.to_bytes(true)),
            &decom_vec[i].blind_factor,
        ) == bc1_vec[i].com
            && bc1_vec[i]
                .correct_key_proof
                .verify(&bc1_vec[i].e, zk_paillier::zkproofs::SALT_STRING)
                .is_ok()
            && bc1_vec[i].e.n.bit_length() >= PAILLIER_MIN_BIT_LENGTH
            && bc1_vec[i].e.n.bit_length() <= PAILLIER_MAX_BIT_LENGTH
            && bc1_vec[i].dlog_statement.N.bit_length() >= PAILLIER_MIN_BIT_LENGTH
            && bc1_vec[i].dlog_statement.N.bit_length() <= PAILLIER_MAX_BIT_LENGTH
            && bc1_vec[i]
                .composite_dlog_proof_base_h1
                .verify(&bc1_vec[i].dlog_statement)
                .is_ok()
            && bc1_vec[i]
                .composite_dlog_proof_base_h2
                .verify(&dlog_statement_base_h2)
                .is_ok();

This is due to bad commitment or DLOG proof. From the paper, round 2 is described as:

image

For a link to the paper

drewstone commented 1 year ago

We should be able to work backwards here and identify why this is occurring as well as why it ever is even able to recover. My main questions are around why it's even able to work eventually, what state is shared between this process and others and are we cleaning / resetting state effectively. CC @shekohex @tbraun96