Improves reorg logic by checking whether we are on sync with the backend or not

sr-gi commented 1 year ago

The current reorg logic does not really take into account whether we are on sync with the backend or not. On a first block connected after a reorg, it will try to send all reorged out data assuming it has knowledge of everything that is already confirmed or in the mempool. However, in a multi block reorg the backend could be at a height that the tower has not processed yet, hence some transactions may be on the chain but not on our internal txindex. Therefore, it could be the case that we try to re-send something that has been reorged-out and see it bounce. Under normal conditions, that would mean the transaction was confirmed a long time ago, since otherwise it would be in our index. However, in this case it may be that it is just confirmed in a subsequent block we haven't processed yet. This will lead to wrongly assuming the tracker was IRREVOCABLY RESOLVED, while in reality it may only have a few confirmations.

This patch fixes that. In the case of a transaction bouncing we will check whether we are on sync with the backend, and only if so consider the tracker as IRREVOCABLY RESOLVED. Otherwise, the tracker will be flagged as OffSync and retried until it bounces when we are on sync, or its status is updated on block processing.

For context, this edge case was introduced when adding support for prune mode. Before that (when txindex for the backend was required) we would have used getrawtransaction to check the confirmation count of the bouncing transaction.

I realized about this when reviewing #190, wrongly believing that this edge case was being introduced by the changes there.

sr-gi commented 1 year ago

This would need further testing. Right now it has just patched the test suite so getblockcount can be used. And I've added minimal coverage.

All in all, better test coverage for reorgs is needed.

sr-gi commented 1 year ago

I guess it is worth mentioning what the flow for an Offsync tracker would be.

Offsync trackers are added to Responder::trackers via the handle_breach -> send_transaction path if the tower is off-sync with the backend when trying to send the penalty transaction, and it bounces.

A tracker can be updated to Offsync from ReorgedOut via the rebroadcast -> send_transaction path if the penalty bounces when rebroadcasting.

This happens only if the dispute is already on chain/mempool, or if the dispute is accepted but the penalty bounces. The former is somehow likely, while the latter is really unlikely. It would involve a block being connected to the backend, with the penalty in it, between trying to send the dispute and the penalty.

~Finally, if trying to send a dispute results in the status being reported as Offsync, we will leave the whole tracker as ReorgedOut. The reasoning for this is that if the dispute bounces and we are not in sync, we want to retry the whole bundle once we are in sync instead of drawing preliminary conclusions. This will happen shortly after, given we are either still connecting blocks in a reorg, or we connected the last but an additional one was connected to our backend. Therefore, this would be solved, at most, in the next polling interval.~ We can actually draw preliminary conclusions. If the dispute bounces it means it is already on chain, so we can broadcast the penalty.

talaia-labs / rust-teos

Improves reorg logic by checking whether we are on sync with the backend or not #235