streamingfast / go-ethereum

Official Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
9 stars 5 forks source link

Firehose outputs blocks with very old LIBNUM when lighthouse/beacon not in sync #1

Closed sduchesneau closed 2 years ago

sduchesneau commented 2 years ago

Description

In some conditions, the geth instance is running with 'optimistic' chain head (?) and will produce blocks with a bad libnum

This will cause lots of problem in firehose (merger, etc.) especially if that new libnum is BELOW previously known libnum (we have merged files up to block 499, but then, geth with firehose produces block 550 with libnum=300 ... this breaks the merger...)

System

Geth version: 1.10.23-fh2 OS & Version: Linux

sduchesneau commented 2 years ago

@maoueh @matthewdarwin ^

sduchesneau commented 2 years ago

The libnum (12660716) was the last finalized block when the mindreader was restarted:

Sep  7 00:52:42 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:52:42.664Z INFO (reader.geth) loaded most recent local finalized block number=12,660,716 hash=44d789..f54956 td=50,353,550,149,371,329 age=1mo1w6d

More logs:

Sep  7 00:57:04 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:57:04.080Z INFO (reader.geth) syncing beacon headers                   downloaded=251,392 left=12,690,092 eta=3h35m13.172s
Sep  7 00:57:33 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:57:33.079Z WARN (reader.geth) beacon client online, but no consensus updates received in a while. Please fix your beacon client to follow the chain!
sduchesneau commented 2 years ago

THIS MAY NOT WORK we are still investigating..

Proposed Workaround when syncing a chain with firehose:

1) stopping the merger... 2) deleting all the one-block-files that are stuck with the old LIBNUM in their name 3) restarting a reader (geth instance) from previous backup with --no-discover until the lighthouse/beacon are synced 4) restarting them without the --no-discover so they sync and produce blocks again 5) restart the merger

sduchesneau commented 2 years ago

example one-block-files produced from a snapshot of geth ropsten:

    16274 2022-09-07 17:23:53 0012660714-77570d9dc40c727f-7ec09df70f47ee5c-12660714-id.dbin.zst
    28921 2022-09-07 17:23:53 0012660715-49cc5918ab51c71d-77570d9dc40c727f-12660715-id.dbin.zst
    18763 2022-09-07 17:23:53 0012660716-0d8edd90b5f54956-49cc5918ab51c71d-12660716-id.dbin.zst
    18865 2022-09-07 17:23:53 0012660717-802db026cba41e25-0d8edd90b5f54956-12660716-id.dbin.zst

the libnum stays 12660716 "forever" passed that block (which happens to be the last Finalized block according to that geth instance when it started)

sduchesneau commented 2 years ago

should be fixed with last PR, relased as geth-v1.10.23-fh2.2