handshake-org / hsd

Handshake Daemon & Full Node
Other
1.94k stars 281 forks source link

WalletDB rescan deadlock fix #868

Closed nodech closed 1 year ago

nodech commented 1 year ago

Calling wdb.rescan from http, can sometimes hang the whole node when wallet is run as a plugin to the node. This happens because the chain.locker and wdb.txLock sequence is swapped in wdb.rescan. It can happen when a node is adding/removing/reorging blocks and we request rescan. Here how it can happen and lock descriptions:

Here we can see that there are partial sequences that lead to deadlock:

  chain.add -> chain.locker.lock -> wdb.addBlock    -> wdb.txLock.lock
               wdb.rescan        -> wdb.txLock.lock -> chain.scan     -> chain.locker.lock
  OR
  wdb.rescan -> wdb.txLock.lock -> chain.scan        -> chain.locker.lock
                chain.add       -> chain.locker.lock -> wdb.addBlock     -> wdb.txLock.lock

This issue is side effect of OOM fix for the plugins in https://github.com/bcoin-org/bcoin/pull/932. Previously, chain would not wait for the wallet to finish addBlock, instead could move forward and unlock chain.locker when chain was done processing that block. Now it waits for the wallet to also finish the process. That, of course, caused issue when chain was moving forward much faster than wallet, wallet would have backlogged list of addBlocks with all relevant block/tx informations eventually causing OOM.

Note that this wont happen if the wallet is running separately as a service. Separate wallet service will experience a backlog instead, because the chain won't wait for HTTP socket events to finish processing. (And maybe it should, but that's a separate issue and continuation of the https://github.com/bcoin-org/bcoin/pull/932)

Related

Changes

nodech commented 1 year ago

Diff is messed up because I moved old wallet-rescan-test to wallet-namestate-rescan-test and used wallet-rescan-test for this one. For better diffing experience, go to the commits themselves.

coveralls commented 1 year ago

Coverage Status

coverage: 68.553% (+0.004%) from 68.549% when pulling 29625eb31689b75914d7243664bacfe7a829b540 on nodech:wallet-deadlock-fix into bb7da60ef3ffdce0be6ccb55185cef89268be671 on handshake-org:master.