tradecraftio / tradecraft

Tradecraft integration/staging tree https://tradecraft.io/download
Other
13 stars 9 forks source link

Fix block-final tx related database corruption on v13 #74

Closed maaku closed 4 years ago

maaku commented 4 years ago

It was been observed in PR #70 that v13 sometimes hangs on start and requires restarting with -reindex=1, and clearing the ban list in order to make progress again. It turns out the root cause of this error is that some invalid cache data is being flushed to disk at one or more points during startup. This PR includes a number of changes related to fixing this problem:

  1. The debug output is improved with changes that were helpful in isolating the error.

  2. A clean node shutdown is triggered if corruption is detected during operation, rather than entering a peer-banning infinite loop as is the observed behavior on v13.2-11780.

  3. During initialization the block-final tx hash in the database is checked, and if not valid (whether missing or corrupted), a reindex from the point of activation of the block-final fork is triggered. This is modeled on the behavior of an upgraded segwit node, which also resynchronizes from the point of activation on first startup after upgrading.

  4. Explicitly track which fields in the cache are valid, so as to not write invalid data to the database when flushed.

(4) actually fixes the root cause; (1) - (3) are to make sure that if a similar issue arises again, the code will be smart enough to restore its internal state and recover operation, and to recover state for any corrupted nodes in the wild that upgrade in the next release.

Also, in testing it was observed that this fixes #48. It turns out the lurking reorg-invalidation problem has this flushing of invalid cache data as its root cause too! We therefore re-enable the pruning RPC test which was previously clobbered by the block-final tx changes.