Open mariocynicys opened 1 year ago
It's all Updating best tip
logs after this.
Note: This tower isn't the master
one.
Are there any existing error-handling mechanisms in place within the SpvClient or related components to handle scenarios where blocks are not delivered due to pruning?
Are there any existing error-handling mechanisms in place within the SpvClient or related components to handle scenarios where blocks are not delivered due to pruning?
They don't consider it errors per se, but they do report it back to the caller with the boolean in BlockSourceResult<(ChainTip, bool)>
. So we should be able to recover-from/react-to that.
If the boolean value indicates that blocks were disconnected, can we retry an attempt to fetch and connect the missing the blocks again?
If the boolean value indicates that blocks were disconnected, can we retry an attempt to fetch and connect the missing the blocks again?
The expected action here is blocks getting connected or disconnected. If one of these things happen the boolean should be true
. If the best tip fetched but without any blocks being connected or disconnected, that's the bad case.
Retrying will probably do nothing since the blocks are pruned already. We can either report the issue to the user or automatically move the spv client's tip forward to a non-purged block and risk not connecting all the block in between.
We may be able to fix this by checking whether we are in IBD or not. bitcoind
defaults to report to be in IBD if the node is started and the chain is lagging behind for longer than 24h (the backend tip stalled for more than a day). This is checked only on bootstrap and once it latches to false
it will not change back to true
while running, even if all peers disconnect from us and we don't get any data for longer than a day. This should not be an issue for us though.
We could either deny running if that is the case or wait until the backend catches up. This is reported by getblockchaininfo
which we happen to currently call when starting the tower in order to check what chain we're running in. We may need to update the wapper to return both the chain and whether we are in IBD. Furthermore, we could have some specially handling case if we are in regtest
or something, given this may not be as relevant in that case and may trigger more often than not.
Here's a PoC for this: https://github.com/sr-gi/rust-teos/tree/ibd-abort. @mariocynicys if you still have a copy of the chain that was triggering this error, would you mind testing it out (assuming you're ok with the approach)?
This is a hard to produce issue but basically what happens is that bitcoind prunes old block which aren't yet delivered to the tower. Thus the tower stops connecting blocks (watching).
Repro:
What will happen:
At this point
spv_client.poll_best_tip
will stop connecting blocks (blocks are connected sequentially, if one is missing we can't connect later ones), which is indicated by the boolean returned.The tower will not get any blocks after this point nor will it report errors.
Such an issue could be triggered with the loss of internet connection of some long time. So it might be worth resolving it automatically and not requiring manual interference.