error: failed storing block in archiver, shutting down. You will need to reprocess over this range to get this block (mindreader/mindreader.go:275){"error": "blocks non contiguous, expectedBlock: 5573454, got block: 5573455"...

matthewdarwin commented 3 years ago

Ran into a problem:

error: failed storing block in archiver, shutting down. You will need to reprocess over this range to get this block (mindreader/mindreader.go:275){"error": "blocks non contiguous, expectedBlock: 5573454, got block: 5573455"...

2 issues here:
1) why does mindreader fail to read the correct block,
2) why does the entire process not just exit? Instead we just left with a partially working system. no firehose blocks are generated.

Other than reading logs, what is the correct way to detect this problem?

And how to recover? Restarting mindreader just gets stuck again at the same spot. I don't care about the hole in merged blocks. Another mindreader (running on another server) will fill it in eventually.

matthewdarwin commented 3 years ago

This has happened a few times.

matthewdarwin commented 3 years ago

It has only happened to me when I've stopped the systemd service. I've been wanting to create backups. But to create a consistent backup need to shutdown the systemd service, which then runs the risk of triggering this issue after it the backup completes and systemd service is started again.

And if have no backups, then I have to re-sync everything from start again. Kinda makes it unusable.

sduchesneau commented 2 years ago

This one comes from a a few elements/conditions not playing nicely, when:

You are running a backfilling process (blocks are very old) so the mindreader is producing merged-blocks by itself
You do not have reference backups/snapshots from the chain already
The system tries to prevent forks and missing blocks from getting into a merged-blocks (this feature is now removed, as discussed in https://github.com/streamingfast/sf-ethereum/issues/4 )

To answer your two questions:

it fails to read correct block because (probably) the kill signal was sent to Geth instead of only sent to sfeth.
this bug where it does not exit was a race condition in the continuity checker (now removed)

To prevent this behavior, here is what we suggest for producing historical blocks (it's actually what we do when indexing a new chain)

Run a node without deep-mind from beginning to end, taking a few backups. (It should catch up 20% faster than with deep-mind)
Run one or multiple mindreaders in parallel, each starting from a different backup (do not perform the backups on the mindreader while they are producing the merged blocks themselves)
If a mindreader (in batch or catchup mode) fails or needs to be restarted, run it again from nearest backup (since it should now stop correctly with the removal of the continuity checker)

Once you caught up with live, you will not have these issues anymore, because the mindreaders will not try to produce merged files by themselves, so taking backups is easier.

Does that (removal of continuity check and approach to producing historical blocks) solve your issue ?

maoueh commented 2 years ago

Lots of improvements happened since then, let's close and track in other issue if reproducible.

streamingfast / firehose-ethereum

error: failed storing block in archiver, shutting down. You will need to reprocess over this range to get this block (mindreader/mindreader.go:275){"error": "blocks non contiguous, expectedBlock: 5573454, got block: 5573455"... #3