Closed paymog closed 2 years ago
So, this warning can be sometimes a false positive when a lot of blocks are ingested in a short amount of time, which is the case when reprocessing a chain from scratch. It's a real problem only when consecutive unlinkable blocks increased forever.
It's possible that it's a true positive also. An unclean shutdown of firehose-ethereum
can be the cause of a hole in the one blocks. Even clean shutdown, if there is a bug somewhere in firehose-ethereum
can cause a hole, we've seen and fixed such bug in the past, could still exist.
To resolve a one-block hole, you need to "rewind" the geth
instance back in time to "reprocess" the missing block, how you can achieve that:
debug.setHead(<0x value of a Integer>)
To reduce the chance of having hole, deploying 2 reader nodes syncing the chain together both writing to shared one-block store greatly reduce the chance that a hole is created.
Have you solve your problem?
ah got it! I ended up wiping the data directory and starting from scratch which seemed to fix it. Good to know there's a way to resolve this with geth. I guess in this specific case, I should use a block prior to 31000 when doing the rewind. Is that correct? Is it better to go back further in time?
If I have two reader nodes sharing the same one-block store, should I make sure they don't share the same data-dir? eg: have two reader node pods (in k8s) with independent volumes for the datadir but use the same s3 bucket for the one-block, merged, forked and index dirs?
Is that correct? Is it better to go back further in time?
Yeah you are correct, should probably use like hole - 500
as the input, it's safe to go back a bit in time. So early in the chain however, setHead
actually start back from genesis anyway, so it's easier to simply restart.
If I have two reader nodes sharing the same one-block store, should I make sure they don't share the same data-dir?
Exact, they have they own data dir so they both sync with the chain, but as you said, they both write to the same one block store, the merger
is intelligent and will delete older files that are not required.
When using this setup, you can configure the flag reader-node-oneblock-suffix
differently for both reader-node
, this one some storage engine. Setting the flag make reader-node write independent files representing the same block, it helps deals with problem of read/write consistency on the same filename we've seen in the past on some storage solution, like Ceph. The merger
is handles those correctly.
Awesome, thank you for all the help. I'll try to make a PR in the next couple days to document all my learnings so other devs have an easier time getting things up and running.
@seanmooretechwriter Is our technical writer, pretty sure he could also make a few adjustments by looking at all the conversation we had over the past weeks :)
Oh right! @seanmooretechwriter I'd be happy to chat (over text/audio/video) if you'd like to get an overview of what I've learned - hopefully that will make writing docs a bit easier 🙂
Hi @paymog
Good to hear from you!
Ok, this sounds like a great idea. Please let me know what your availability looks like and we can set up a time and day to chat and go over all of this. I'll read through more of this Git issue prior to our chat.
Thank you a ton for your time and the awesome and valuable feedback! :)
cc: @maoueh
The call went well and we were able to gather useful and insightful input for areas of improvement for the Firehose documentation.
Alex provided great notes encapsulating the input and we've identified a few areas to address in the very near future.
It seems that my merger is now stuck at block 31000. I see the following log line emitted over and over
and here's a more complete sample of log lines:
Are the unlinkable blocks an issue? If so, how can I resolve them?