Async mempool scanning - Githubissues

conduition commented 9 months ago

Fixes https://github.com/romanz/electrs/issues/968. Mempool scanning is now done asynchronously in a separate thread called the mempool_sync thread.

Method

My approach was to split the Mempool::sync method into two parts: (1) fetching the updated mempool transactions [slow, read-only] and (2) applying those transactions [fast, write].

Mempool scanning is first triggered when the main thread sends a HashSet<Txid> to the mempool_sync thread. This set represents the current set of TXIDs in the mempool which we already know about.
The mempool_sync thread proceeds to scan the mempool from the remote node. While the scan is ongoing (could take a long time) the main thread can still process block updates, script notifications, and electrum client requests.
Once the mempool scan finishes, the mempool_sync thread blocks until it can send a MempoolSyncUpdate to the main thread.
When the main thread has a spare moment to receive this packaged update, it synchronously applies the update to the Mempool, via Rpc::mempool_apply. This adds new transactions, removes outdated ones, and updates the fee histogram.

Advantages

Scanning is asynchronous and read-only, while writing the update to the mempool's in-memory state is synchronous WRT the main-thread. This way, we don't introduce any mutexes because we don't need concurrent read/writes of Mempool. Electrum client requests are always processed with a consistent view of the mempool, even if a scan is ongoing and could finish at any time.

Histogram Perf

While i was in there, I improved the performance of applying mempool updates, by avoiding needless recomputation of the FeeHistogram from scratch every time. Instead we dynamically update fee histogram bins according to which transactions have been added or removed. I added unit tests to demonstrate the logic is sound. If preferable, I can separate the FeeHistogram changes and package it into another PR (although it will probably conflict with this one).

romanz commented 9 months ago

Many thanks! Could you please open a separate PR for the first two commits? (You can keep these commits also in this PR, to prevent merge conflicts)

conduition commented 9 months ago

Requested changes force pushed, amending the existing commits. @romanz you can view the diffs here. If you feel comfortable with that, i'll submit https://github.com/romanz/electrs/pull/970/commits/9b5f925934de1986c803881cb866f3770a872175 and https://github.com/romanz/electrs/pull/970/commits/ed42680992994488961bbf5aa3028a7a3d4b5dcc as a new PR.

romanz commented 9 months ago

If you feel comfortable with that, i'll submit 9b5f925 and ed42680 as a new PR.

Please do :)

conduition commented 9 months ago

PR open here, this PR is now dependent on that one

romanz commented 9 months ago

Please rebase over master (since #971 is merged).

conduition commented 9 months ago

Rebase complete :+1:

romanz commented 9 months ago

Seems that the CI failure is not related to https://github.com/romanz/electrs/pull/970/commits/06303d90d87100163f05b1bf2f0a72a7d280eb4f - I'll merge it via https://github.com/romanz/electrs/pull/973

romanz commented 9 months ago

Please rebase over latest master.

romanz commented 9 months ago

Could you please take a look at the failure? You can reproduce it locally by running:

$ docker build -f Dockerfile.ci . --rm -t electrs:tests
$ docker run -v $PWD/contrib/:/contrib -v $PWD/tests/:/tests --rm electrs:tests bash -x /tests/run.sh
...
+ TXID=20521100abc8df819901ccc5f964059ffc02303c4b421dccbf43322928221a2b
+ echo ' * get_tx_status'
 * get_tx_status
++ electrum --regtest --wallet=data/electrum/wallet get_tx_status 20521100abc8df819901ccc5f964059ffc02303c4b421dccbf43322928221a2b
++ jq -c .
Transaction not in wallet.
+ test '' == '{"confirmations":0}'
...

IIUC, it seems that the broadcasted transaction is not being reported back to Electrum client.

romanz commented 9 months ago

You can modify tests/run.sh to debug this issue, e.g.:

diff --git a/tests/run.sh b/tests/run.sh
index e19b4d2..b19d8cd 100755
--- a/tests/run.sh
+++ b/tests/run.sh
@@ -13,6 +13,7 @@ cleanup() {
    kill $j
    wait $j
   done
+  cat data/electrs/regtest-debug.log
 }
 trap cleanup SIGINT SIGTERM EXIT

@@ -68,6 +69,8 @@ NEW_ADDR=`$EL getunusedaddress`
 echo " * payto & broadcast"
 TXID=$($EL broadcast $($EL payto $NEW_ADDR 123 --fee 0.001 --password=''))

+sleep 20
+
 echo " * get_tx_status"
 test "`$EL get_tx_status $TXID | jq -c .`" == '{"confirmations":0}'

conduition commented 9 months ago

Rebased :white_check_mark:

I'm now looking into the mempool missing TX error. My guess is that async mempool scanning results in a delayed reaction where electrs only receives the newly broadcasted TX back from bitcoind after the current mempool scan iteration runs once or twice.

Prior to this PR, the synchronous nature of mempool scanning meant that after broadcasting a transaction, the next iteration of the main thread's event loop triggered a mempool scan, and further client requests (like the subsequent TX lookup) weren't processed until the mempool scan was complete, so the request always got the most up-to-date view of the mempool possible after broadcasting.

I see two options to address this:

After a TX broadcast, immediately execute a synchronous mempool poll/update cycle in the main thread (blocking other client requests)
After a TX broadcast, add the new transaction into the Mempool state manually (using Mempool::apply_sync_update).

I think option 2 is the more appropriate choice. I don't see any obvious associated negatives. The mempool sync thread should remove the entry later if the broadcasted TX we add manually is evicted for some reason.

Still waiting for the dockerfile to build for the moment. i'll return tomorrow with an update once I've verified my suspicions

conduition commented 9 months ago

@romanz sorry for the delay. My suspicions were correct, and I applied a fix which automatically appends the newly broadcasted TX to the mempool immediately upon broadcast.

romanz commented 9 months ago

@conduition Could you please review https://github.com/romanz/electrs/pull/979 which should also help when syncing large mempools?

conduition commented 9 months ago

@romanz I rebased this now that https://github.com/romanz/electrs/pull/979 has been merged. I think this PR still has merit, because even if mempool scans are now much faster, there's still no need to block client requests on them. For my use case, where i have bitcoind on a separate machine, this PR is a must-have

conduition commented 9 months ago

Putting this briefly into Draft state; i found I was getting spurious logs after a07b70d.

[2023-12-24T18:21:13.575Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:13.583Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:13.621Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:13.675Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:13.682Z DEBUG electrs::p2p] loading 1 blocks
[2023-12-24T18:21:15.677Z DEBUG electrs::status] 1 transactions from 1 blocks
[2023-12-24T18:21:15.677Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:15.684Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:15.759Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed
[2023-12-24T18:21:15.764Z DEBUG electrs::mempool] 82650 mempool txs: 0 added, 0 removed

will fix

conduition commented 9 months ago

OK should be fixed now

conduition commented 3 months ago

Hey @romanz, just wanted to check in and see if I should take the time to rebase this or?

romanz / electrs

Async mempool scanning #970

Method

Advantages

Histogram Perf