tradecraftio / tradecraft

Tradecraft integration/staging tree https://tradecraft.io/download
Other
13 stars 9 forks source link

Reorg problem exposed by pruning RPC test suite #48

Closed maaku closed 4 years ago

maaku commented 4 years ago

The core issue hasn't been exactly diagnosed, but it is repeatable. At the time of writing, the "bitcoin" RPC compatibility mode disables activation of the block-final transaction rule. Removing this aspect of bitcoin mode and allowing the block-final rules to activate causes the 'pruning.py' RPC test suite to fail for reasons indicative of an underlying bug. It is not yet known why, but GetFinalTx() on the tip returns a 0 value instead of the expected transaction hash, making that node unable to mine blocks. Shutting it down and doing a -reindex solves the problem, but this is an internal state that should not be possible to create.

drwatson84 commented 4 years ago

Reorg attacks are a big problem and this should be your highest priority if you want to keep you chain safe. Limit chain reorgs to 10 or 20 blocks, that's it.

maaku commented 4 years ago

Fixed typo in title.

@drwatson84 If you are referring to just the low hash rate and therefore reorg vulnerability, I 100% agree. However limiting the depth of chain reorgs simply substitutes one problem for another--then there would be an easy way to not just reorg but partition the network, which is worse overall. The real solution here is to switch proof-of-work algorithms to something which is incentive-compatible with being a minority chain in a multi-algo hash marketplace. I'm working on doing that as a soft-fork via Forward Blocks, but it'll take a while to get deployed.

The problem this particular issue is about is that the "block-final" soft-fork, which is deployed and will activate in a few weeks, and which is a necessary precursor to Forward Blocks, has an obscure corner-case where some specific type of long reorg past the point of activation causes corruption of internal state. So it's not any long reorg, but only a subset of those which go back to activation of this soft-fork. Furthermore, this really only seems to happen in the reorg.py RPC test suite. I tested a manual reorg in regtest mode and everything worked as expected, so it really is something specific to this test suite.

Now this is bad and ought to get fixed. It does bother me that the exact cause has not been identified yet. But it also bothers me that we're running on an older version of the upstream daemon software, and therefore exposed to multiple potential vulnerabilities. There's a lot to do and few hands to do it :(

In this particular case, the most pragmatic thing to do seems to be to pay special attention during the activation period, and reboot with -reindex any nodes that seem to have any trouble due to long reorgs, if there are any, then let the chain build a lot of history on top.

In the meantime I'll keep trying to track down the source of this bug.