tangle-network / dkg-substrate

Multy-party threshold ECDSA (GG20) Substrate node
https://tangle.webb.tools/
GNU General Public License v3.0
60 stars 15 forks source link

[BUG] RefreshVote proposal is stuck in UnsignedProposal queue #636

Closed 1xstj closed 1 year ago

1xstj commented 1 year ago

Describe the bug In the current testnet, the RefreshVote proposal is picked for signing and stalls in every try, other proposals sent to the dkg are being signed without issues.

Log output

Log Output ```Paste log output here 2023-06-12 16:16:25.736 INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: 🕸️ Async proto sent outbound message: Offline: (i, proposal) = (3, Unsigned { kind: Refresh, data: BoundedVec([0, 0, 1, 31, 12, 117, 112, 82, 117, 81, 200, 228, 178, 53, 101, 197, 70, 148, 92, 218, 163, 55, 4, 146, 112, 108, 245, 245, 143, 1, 171, 146, 228, 150, 196, 220, 106, 159, 112, 54, 48, 31, 130, 199, 203, 102, 85, 30, 244, 238, 221, 163, 20, 235, 208, 100, 133, 122, 81, 212, 74, 134, 35, 0, 137, 199, 183, 246], 10000) }) |   -- | --   |   | 2023-06-12 16:16:25.736 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Protocol : OnHeap("/webb-tools/dkg/signing/1") \| Sending message to all peers |     |   | 2023-06-12 16:16:25.733 INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Async proto about to send outbound message in session=286 from=3 to=None for round 1\| (ty: Offline: (i, proposal) = (3, Unsigned { kind: Refresh, data: BoundedVec([0, 0, 1, 31, 12, 117, 112, 82, 117, 81, 200, 228, 178, 53, 101, 197, 70, 148, 92, 218, 163, 55, 4, 146, 112, 108, 245, 245, 143, 1, 171, 146, 228, 150, 196, 220, 106, 159, 112, 54, 48, 31, 130, 199, 203, 102, 85, 30, 244, 238, 221, 163, 20, 235, 208, 100, 133, 122, 81, 212, 74, 134, 35, 0, 137, 199, 183, 246], 10000) })) |     |   | 2023-06-12 16:16:25.732 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Not received messages from : [1, 2, 4] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [128, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [128, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [128, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [128, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [121, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [121, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [121, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [121, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [120, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [120, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [120, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will skip passing message to state machine since not for this associated block, msg block [120, 159, 1, 0] expected block [129, 159, 1, 0] |     |   | 2023-06-12 16:16:25.571 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Not received messages from : [] |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Running AsyncProtocol with party_index: 3 |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Offline stage index: 3 |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Party Index converted to offline stage Index : KeygenPartyId(3) |     |   | 2023-06-12 16:16:25.571 DEBUG tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Received unsigned proposal |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Will now deliver 16 enqueued message(s) to the async protocol for "cf7a870e4b87a7c53b53f485bc981bfc9f45059b0bae49fdfaecab607ef6de7a" |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: [worker] Starting job "cf7a870e4b87a7c53b53f485bc981bfc9f45059b0bae49fdfaecab607ef6de7a" |     |   | 2023-06-12 16:16:25.571 INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: [worker] Received job [207, 122, 135, 14, 75, 135, 167, 197, 59, 83, 244, 133, 188, 152, 27, 252, 159, 69, 5, 155, 11, 174, 73, 253, 250, 236, 171, 96, 126, 246, 222, 122] |     |   | 2023-06-12 16:16:25.571 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Signing protocol for proposal hash [207, 122, 135, 14, 75, 135, 167, 197, 59, 83, 244, 133, 188, 152, 27, 252, 159, 69, 5, 155, 11, 174, 73, 253, 250, 236, 171, 96, 126, 246, 222, 122] will start later in the work manager |     |   | 2023-06-12 16:16:25.571 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Active local key enabled for stage Signing { unsigned_proposal_hash: [207, 122, 135, 14, 75, 135, 167, 197, 59, 83, 244, 133, 188, 152, 27, 252, 159, 69, 5, 155, 11, 174, 73, 253, 250, 236, 171, 96, 126, 246, 222, 122] }? true |     |   | 2023-06-12 16:16:25.568 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: 🕸️ (sr25519_public_key) Multiple private keys found for: [4e85271af1330e5e9384bd3ac5bdc04c0f8ef5a8cc29c1a8ae483d674164745c (5Dqf9U5d...), 804808fb75d16340dc250871138a1a6f1dfa3cab9cc1fbd6f42960f1c39a950d (5ExuPJNy...)] (2) |     |   | 2023-06-12 16:16:25.564 WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: 🕸️ (sr25519_public_key) Multiple private keys found for: [4e85271af1330e5e9384bd3ac5bdc04c0f8ef5a8cc29c1a8ae483d674164745c (5Dqf9U5d...), 804808fb75d16340dc250871138a1a6f1dfa3cab9cc1fbd6f42960f1c39a950d (5ExuPJNy...)] (2) |     |   | 2023-06-12 16:16:25.564 INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: 🕸️ Session Id 286 \| 3-out-of-5 signers: ([KeygenPartyId(1), KeygenPartyId(2), KeygenPartyId(3), KeygenPartyId(4)]) ```

Expected behaviour RefreshVote should be signed like any other proposal

tbraun96 commented 1 year ago

The issue, as the logs confirm, is that there is staggering. Some nodes claim a process is stalled at block X, while the same unsigned proposal for another block is stalled at block X+k, where k != 0. All it takes is one mis-sync event, then suddenly, messages will never get delivered due to filtering of expecting the associated block IDs to be equal. #634 will allow for a range less than the stall limit (10 blocks).