sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.94k stars 746 forks source link

Investigate stuck / slow range sync on PeerDAS networks #6113

Closed jimmygchen closed 2 months ago

jimmygchen commented 4 months ago

Description

6004 fixes a bunch of known issues and Lighthouse is now able to slowly sync with peers, provided that peers are able to respond to data column by range requests. However it's not really reliable and gets "stuck" quite easily, so I suspect there are some other issues out there that haven't been discovered.

I don't have the logs with me now, however it's quite easy to reproduce:

  1. Start a local testnet with the network_params_das_local.yaml config
  2. Stop one Lighthouse node, and wait for 2-3 epochs to make sure it triggers range sync
  3. Start the Lighthouse node, notice that sync gets stuck pretty quickly

This is now a bit harder to test due to #6108 and it might make sense to get to the bottom of that one first, or alternatively we could try to run a testnet with Prysm / Teku, whichever is able to serve the requests.

jimmygchen commented 2 months ago

Fixed in #6276