sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.92k stars 742 forks source link

Beacon node stops syncing after reboot #1560

Open GeoJesse opened 4 years ago

GeoJesse commented 4 years ago

Description

I followed the tutorial here for staking ethereum with the exception of using a Raspberry Pi

Hardware: Raspberry Pi Model B Samsung External SSD T5 (lighthouse and geth db's stored here) OS: Ubuntu v20.04 (LTS) x64 ARM Software: Geth 1.9.19-stable-3e064192 ARM 64 Lighthouse v0.2.6-2bc9115; BLS Library: blst

Version

Lighthouse v0.2.6-2bc9115; BLS Library: blst rustc 1.45.2 (d3fb005a3 2020-07-31)

Please provide your Lighthouse and Rust version. Are you building from master, which commit? Pulled from 'master'

Present Behaviour

Whenever there is a reboot the syncing stops and gets stuck on that last slot. The services all get restarted but the beacon service never picks up where it left off and 'est-time' in the logs is '--'.

Expected Behaviour

After rebooting I would expect the beacon service to take over where it left off.

Steps to resolve

Please describe the steps required to resolve this issue, if known. The only way to get it up and running again is to stop the beacon sewrvice, purge the beacon database, then restart. However this causes it to start all over from the beginning.

divagant-martian commented 4 years ago

This might be related to #1488

GeoJesse commented 4 years ago

I'm not sure if it's related or not but it appears that it's only an issue if you haven't already fully sync'd. My beacon node just finished syncing so I tried rebooting the server again. This time it picked up with no problem. So it doesn't look like an issue for me anymore but it could be a bug for people who are trying to sync the first time.

jwoyame commented 4 years ago

After Jesse showed me this, I tested it out on my own Ubuntu Desktop machine (Lighthouse v0.2.6-ebb25b55) and was able to reproduce the issue.

Here's where I stop the beacon node while it is syncing:

Aug 21 21:24:50 jon-desktop lighthouse[6885]: Aug 21 21:24:50.000 INFO Syncing                                 est_time: 1 hr 55 mins, speed: 17.98 slots/sec, distance: 125132 slots (2 weeks 3 days), peers: 2, service: slot_notifier
Aug 21 21:25:02 jon-desktop lighthouse[6885]: Aug 21 21:25:02.000 INFO Syncing                                 est_time: 1 hr 55 mins, speed: 18.00 slots/sec, distance: 124940 slots (2 weeks 3 days), peers: 2, service: slot_notifier
Aug 21 21:25:14 jon-desktop lighthouse[6885]: Aug 21 21:25:14.000 INFO Syncing                                 est_time: 1 hr 37 mins, speed: 21.33 slots/sec, distance: 124685 slots (2 weeks 3 days), peers: 4, service: slot_notifier
Aug 21 21:25:22 jon-desktop systemd[1]: Stopping Lighthouse Beacon Node...
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.463 INFO Shutting down..
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.464 INFO HTTP service shutdown                   service: http
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.466 INFO Saved DHT state                         service: network
Aug 21 21:25:22 jon-desktop lighthouse[6885]: Aug 21 21:25:22.466 INFO Network service shutdown                service: network
Aug 21 21:25:24 jon-desktop systemd[1]: lighthousebeacon.service: Succeeded.
Aug 21 21:25:24 jon-desktop systemd[1]: Stopped Lighthouse Beacon Node.

Here's the log after starting again:

Aug 21 21:25:49 jon-desktop systemd[1]: Started Lighthouse Beacon Node.
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 WARN Ethereum 2.0 is pre-release. This software is experimental.
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Lighthouse started                      version: Lighthouse/v0.2.6-ebb25b55
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Configured for testnet                  name: medalla
Aug 21 21:25:49 jon-desktop lighthouse[6965]: Aug 21 21:25:49.894 INFO Data directory initialised              datadir: /eth/data/lighthouse/beacon-node
Aug 21 21:25:50 jon-desktop lighthouse[6965]: Aug 21 21:25:50.323 INFO Starting beacon chain                   method: resume, service: beacon
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.353 INFO Block production enabled                method: json rpc via http, endpoint: http://127.0.0.1:8545
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.418 INFO Beacon chain initialized                head_slot: 1631, head_block: 0x3465…01f0, head_state: 0xbae2…b4f4, service: beacon
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.419 INFO Timer service started                   service: node_timer
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.420 INFO Libp2p Service                          peer_id: 16Uiu2HAm4wqR5HtP7AB86gDqRoxjeBuF9dEMs6LALBMtQAgefXPG, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.420 INFO ENR Initialised                         tcp: Some(9000), udp: None, ip: None, id: 0xf574..a8ea, seq: 1, enr: enr:-KO4QJsGUZ8lsu4zly8XfW17mS_i-Q0SmDOLXgoMu9jy2_irZeOUboYO54DsVphNwzU_KryQXU-Z97o5ZE7V4W_UsmoBh2F0dG5ldHOIAAAAAAAAAACEZXRoMpDnp11aAAAAAf__________gmlkgnY0iXNlY3AyNTZrMaECjXNGmq3lWJ6tpKhEvKSf8cgGOHP4tIfpxfphhzpUNKODdGNwgiMo, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.421 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAm4wqR5HtP7AB86gDqRoxjeBuF9dEMs6LALBMtQAgefXPG, service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.422 INFO Subscribed to topics                    topics: [BeaconBlock, BeaconAggregateAndProof, VoluntaryExit, ProposerSlashing, AttesterSlashing], service: libp2p
Aug 21 21:26:12 jon-desktop lighthouse[6965]: Aug 21 21:26:12.464 INFO HTTP API started                        port: 5052, address: 127.0.0.1, service: http
Aug 21 21:26:26 jon-desktop lighthouse[6965]: Aug 21 21:26:26.001 WARN Low peer count                          peer_count: 0, service: slot_notifier
Aug 21 21:26:26 jon-desktop lighthouse[6965]: Aug 21 21:26:26.001 INFO Searching for peers                     current_slot: 126131, head_slot: 1631, finalized_epoch: 48, finalized_root: 0x952a…39bb, peers: 0, service: slot_notifier
Aug 21 21:26:37 jon-desktop lighthouse[6965]: Aug 21 21:26:37.729 INFO Sync state updated                      new_state: Syncing Finalized Chain, old_state: Stalled, service: sync
Aug 21 21:26:38 jon-desktop lighthouse[6965]: Aug 21 21:26:38.001 INFO Syncing                                 est_time: --, distance: 124501 slots (2 weeks 3 days), peers: 4, service: slot_notifier
Aug 21 21:26:47 jon-desktop lighthouse[6965]: Aug 21 21:26:47.836 WARN Batch failed to download. Dropping chain scoring peers, batch_epoch: 48, score_adjustment: Low Tolerance Error, chain: 1582718186899813585, service: sync
Aug 21 21:26:50 jon-desktop lighthouse[6965]: Aug 21 21:26:50.001 INFO Syncing                                 est_time: --, distance: 124502 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:02 jon-desktop lighthouse[6965]: Aug 21 21:27:02.002 INFO Syncing                                 est_time: --, distance: 124503 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:14 jon-desktop lighthouse[6965]: Aug 21 21:27:14.001 INFO Syncing                                 est_time: --, distance: 124504 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:26 jon-desktop lighthouse[6965]: Aug 21 21:27:26.001 INFO Syncing                                 est_time: --, distance: 124505 slots (2 weeks 3 days), peers: 5, service: slot_notifier
Aug 21 21:27:28 jon-desktop lighthouse[6965]: Aug 21 21:27:28.376 WARN Batch failed to download. Dropping chain scoring peers, batch_epoch: 48, score_adjustment: Low Tolerance Error, chain: 16513122307734969754, service: sync

Notice that after restarting the beacon, the process will remain as "Syncing" with est_time: --

The only fix we've been able to use is to clear the beacon directory and restart. Restarting after it has reached the "Synced" state works fine.

q9f commented 4 years ago

Having the same issue.

Aug 24 10:20:50.000 INFO Syncing                                 est_time: --, distance: 37488 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:21:02.000 INFO Syncing                                 est_time: --, distance: 37489 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:21:14.007 INFO Syncing                                 est_time: --, distance: 37490 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:21:26.000 INFO Syncing                                 est_time: --, distance: 37491 slots (5 days 4 hrs), peers: 42, service: slot_notifier
Aug 24 10:21:38.001 INFO Syncing                                 est_time: --, distance: 37492 slots (5 days 4 hrs), peers: 42, service: slot_notifier
Aug 24 10:21:50.001 INFO Syncing                                 est_time: --, distance: 37493 slots (5 days 4 hrs), peers: 43, service: slot_notifier
Aug 24 10:22:02.001 INFO Syncing                                 est_time: --, distance: 37494 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:22:14.000 INFO Syncing                                 est_time: --, distance: 37495 slots (5 days 4 hrs), peers: 40, service: slot_notifier
Aug 24 10:22:26.000 INFO Syncing                                 est_time: --, distance: 37496 slots (5 days 4 hrs), peers: 41, service: slot_notifier
Aug 24 10:22:38.001 INFO Syncing                                 est_time: --, distance: 37497 slots (5 days 4 hrs), peers: 40, service: slot_notifier
Aug 24 10:22:50.001 INFO Syncing                                 est_time: --, distance: 37498 slots (5 days 4 hrs), peers: 37, service: slot_notifier
Aug 24 10:23:02.001 INFO Syncing                                 est_time: --, distance: 37499 slots (5 days 4 hrs), peers: 37, service: slot_notifier
heueristik commented 4 years ago

Me too. My DB was 105 GB in size, so I installed v0.2.7 and ran --purge-db and now I am stuck in this cycle. Update: I ran --purge-db again and for now it is working.

Clete2 commented 4 years ago

I was following directions for docker on the official docs and ran into this issue.

Based on reading the above, I was able to fix it by:

  1. Wait for geth to fully sync
  2. docker-compose down
  3. mv lighthouse-data/beacon lighthouse-data/beacon_old (just to be safe)
  4. docker-compose up -d

Now syncing successfully. Have not tried/dared to stop it.

AgeManning commented 4 years ago

There is also a new sync PR coming which should address most issues related to the medalla large period of non-finality.

icculp commented 6 months ago

It seems this is still happening, lighthouse getting stuck after a restart when using genesis sync

AgeManning commented 6 months ago

@icculp do you have any logs you could send to help diagnose this?