harmony-one / harmony

The core protocol of harmony
https://harmony.one
GNU Lesser General Public License v3.0
1.46k stars 286 forks source link

Testnet shard 1 down #4654

Closed diego1q2w closed 2 months ago

diego1q2w commented 5 months ago

Description After upgrading the testnet to enable leader rotation and transition to external validators, no external node was elected for shard 1's committee by the targeted epoch, causing shard 1 to fail (while shard 0 operates normally). Investigation revealed shard 1 lacks a committee because internal validators are ineligible, and there was no external validation at the epoch transition.

Proposed Solution We need to undo the hardfork with a new hardfork, restoring the previous setup where internal validators manage consensus. This change will allow shard 0 to recalculate participants for both shards, including internal nodes. Upon implementation in shard 0, we should restart shard 1 nodes to recognize the new committee for consensus initiation. If this restart doesn't trigger the process, we'll implement a temporary fix in shard 1 by hardcoding the committee (based on what shard 0 calculates at startup) until shard 1 stabilizes and can update its database with the correct committee information. After ensuring shard 1's stable operation and database update following a successful epoch, we can remove the temporary fix, verify external validators' participation, and proceed with re-externalizing the network.