prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 985 forks source link

If all nodes of a private Ethereum network built with Geth+Prysm go down, how to restore the network? #14447

Open 0xWilliamWang opened 1 week ago

0xWilliamWang commented 1 week ago

💎 Issue

Background

I plan to use Geth+Prysm to build an Ethereum private blockchain as L1, and use Optimism as L2 to learn and understand Ethereum Rollup.

There is a tricky problem with the L1 network: As long as all nodes of L1(cl) are shut down, the L1 network cannot be restored, and L2 cannot be restored. If it runs for a long time, all blockchain data will be lost, and the L1 and L2 networks have to be reset. I don’t want this.

Description

  1. Stop all L1 el clients, and restart them after a few minutes. The l1 network can still recover and continue to produce blocks
  2. Stop all L1 cl clients. Even if all clients are restarted immediately, the l1 network cannot continue to produce blocks, and it cannot be recovered even after a long time
  3. Many people have encountered the same problem. see Issues · OffchainLabs/eth-pos-devnet
  4. Tracking the prysm program logic, I found that it is mainly stuck at this position https://github.com/prysmaticlabs/prysm/blob/develop/beacon-chain/p2p/peers/status.go#L756 . It assumes that the head slot of the peer is always larger than its own, but when all nodes are shut down, everyone’s head slot is the same, so it is deadlocked. I don’t know if this logic is intentional or a bug? How can I restore the L1 network without losing data? Any tips will be appreciated.

    (base) B450M3600X eth-pos-devnet git:(master) ✗ dcl1 logs cl1 --tail 10
    cl1-1  | time="2024-09-13 08:06:22" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:27" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:32" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:34" level=error msg="Beacon node is not respecting the follow distance. EL client is syncing." lastBlockNumber=296 prefix=execution
    cl1-1  | time="2024-09-13 08:06:37" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:42" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:47" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:48" level=error msg="Beacon node is not respecting the follow distance. EL client is syncing." lastBlockNumber=296 prefix=execution
    cl1-1  | time="2024-09-13 08:06:52" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=3 suitable=0
    cl1-1  | time="2024-09-13 08:06:52" level=info msg="Connected peers" inboundTCP=4 outboundTCP=0 prefix=p2p total=4
    (base) B450M3600X eth-pos-devnet git:(master) ✗ dcl1 logs el1 --tail 10
    el1-1  | INFO [09-13|08:05:51.733] IPC endpoint opened                      url=/root/.ethereum/geth.ipc
    el1-1  | INFO [09-13|08:05:51.733] Loaded JWT secret file                   path=/jwtsecret crc32=0x6fc756fc
    el1-1  | INFO [09-13|08:05:51.734] HTTP server started                      endpoint=[::]:8545 auth=false prefix= cors=* vhosts=*
    el1-1  | INFO [09-13|08:05:51.734] WebSocket enabled                        url=ws://[::]:8546
    el1-1  | INFO [09-13|08:05:51.734] WebSocket enabled                        url=ws://[::]:8551
    el1-1  | INFO [09-13|08:05:51.734] HTTP server started                      endpoint=[::]:8551 auth=true  prefix= cors=localhost vhosts=*
    el1-1  | INFO [09-13|08:06:01.742] Looking for peers                        peercount=0 tried=4 static=0
    el1-1  | INFO [09-13|08:06:11.756] Looking for peers                        peercount=0 tried=0 static=0
    el1-1  | INFO [09-13|08:06:21.771] Looking for peers                        peercount=0 tried=0 static=0
    el1-1  | WARN [09-13|08:06:25.694] Post-merge network, but no beacon client seen. Please launch one to follow the chain!
    

    Problem reproduction:

  5. clone this repo 0xWilliamWang/geth-prysm-optimism-rollup-dmeo: This repository plans to use Geth+Prysm to build Ethereum private blockchain as L1, and ethereum-optimism/optimism: Optimism is Ethereum, scaled. as L2 to learn and understand Ethereum Rollup
  6. start l1 network bash helper.sh resetL1
  7. source alias.sh
  8. dcl1 down -v
  9. dcl1 up -d
  10. dcl1 logs ...
FxLuc commented 3 days ago

Hope someone has a solution here. I have the same problem