Closed sduchesneau closed 2 years ago
@maoueh @matthewdarwin ^
The libnum (12660716) was the last finalized block when the mindreader was restarted:
Sep 7 00:52:42 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:52:42.664Z INFO (reader.geth) loaded most recent local finalized block number=12,660,716 hash=44d789..f54956 td=50,353,550,149,371,329 age=1mo1w6d
More logs:
Sep 7 00:57:04 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:57:04.080Z INFO (reader.geth) syncing beacon headers downloaded=251,392 left=12,690,092 eta=3h35m13.172s
Sep 7 00:57:33 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:57:33.079Z WARN (reader.geth) beacon client online, but no consensus updates received in a while. Please fix your beacon client to follow the chain!
THIS MAY NOT WORK we are still investigating..
Proposed Workaround when syncing a chain with firehose:
1) stopping the merger... 2) deleting all the one-block-files that are stuck with the old LIBNUM in their name 3) restarting a reader (geth instance) from previous backup with --no-discover until the lighthouse/beacon are synced 4) restarting them without the --no-discover so they sync and produce blocks again 5) restart the merger
example one-block-files produced from a snapshot of geth ropsten:
16274 2022-09-07 17:23:53 0012660714-77570d9dc40c727f-7ec09df70f47ee5c-12660714-id.dbin.zst
28921 2022-09-07 17:23:53 0012660715-49cc5918ab51c71d-77570d9dc40c727f-12660715-id.dbin.zst
18763 2022-09-07 17:23:53 0012660716-0d8edd90b5f54956-49cc5918ab51c71d-12660716-id.dbin.zst
18865 2022-09-07 17:23:53 0012660717-802db026cba41e25-0d8edd90b5f54956-12660716-id.dbin.zst
the libnum stays 12660716 "forever" passed that block (which happens to be the last Finalized block according to that geth instance when it started)
should be fixed with last PR, relased as geth-v1.10.23-fh2.2
Description
In some conditions, the geth instance is running with 'optimistic' chain head (?) and will produce blocks with a bad libnum
Example from Ropsten
Logs that seem relevant
7358 Sep 7 00:53:01 ropsten-sfdm37 sf[1953540]: 2022-09-07T00:53:01.293Z WARN (reader.geth) beacon chain gapped head=12,941,410 newHead=12,941,419
This will cause lots of problem in firehose (merger, etc.) especially if that new libnum is BELOW previously known libnum (we have merged files up to block 499, but then, geth with firehose produces block 550 with libnum=300 ... this breaks the merger...)
System
Geth version:
1.10.23-fh2
OS & Version: Linux