rust-bitcoin / bitcoind

Utility to launch a regtest bitcoind process in a rust test
MIT License
40 stars 33 forks source link

Address the race condition #101

Open Kixunil opened 1 year ago

Kixunil commented 1 year ago

If my understanding is correct this crate launches a separate instance per test, which is nice but because of the race condition and tests being run in parallel this can cause flaky tests which is quite bad. It'd be great to fix the race completely.

I took a look at the obvious "what happens if I set -port 0 -rpcport 0" The answer is `bitcoind seems to attempt to bind the default port anyway (WTF?) and doesn't log the actual port so if we do this we have no way of figuring it out.

I see only two solutions that include existing bitcoind instances:

  1. On platforms that support it we could LD_PRELOAD a library that overrides bind() to pass in port 0 and then reports the mapping over a pipe. We could read that report and learn the actual port number.
  2. Create private network namespace which has a custom IP and is somehow allowed to communicate with the parent. This should be possible on all platforms which support Docker because it presumably does this but it may require root which is not nice.

Of course the long term solution is to add direct support to bitcoind itself.

RCasatta commented 1 year ago

Do you have evidence of flaky tests?

Because ATM bitcoind is tried to be started three times, every time with different ports asked to the OS so it should be pretty hard to hit the race condition.

But yes if bitcoind get some way to communicate which random ports it is started with, like writing a file in datadir, it would be better.

Kixunil commented 1 year ago

It's just my guess that it could happen. I was looking at using this library but so far I went for extending a shell script which I already had. I'm still contemplating rewriting, though.

MaxFangX commented 1 year ago

Do you have evidence of flaky tests?

šŸ™‹ā€ā™€ļø I just tried to upgrade our bitcoind from version 22_0 to 23_0 and it introduced flakiness in our tests. We have two tests that spin up a BitcoinD instance, with the result that sometimes the tests will both pass, sometimes they will both fail, and sometimes only 1 of the 2 will fail.

We have not observed any flakiness when using 22_0, so it seems that there's something specific to 23_0 that causes these problems. I am not sure if it is related to the ports issue you guys are discussing; the error I get is SQLite trying to acquire a lock on the database in order to verify a wallet file:

failures:

---- runner::tests::integration::load_shutdown_runner stdout ----
thread 'runner::tests::integration::load_shutdown_runner' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14

---- runner::tests::integration::load_status_shutdown_node stdout ----
thread 'runner::tests::integration::load_status_shutdown_node' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14

failures:
    runner::tests::integration::load_shutdown_runner
    runner::tests::integration::load_status_shutdown_node

test result: FAILED. 30 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.45s

The tests use the default bitcoind::Conf, only pushing an -rpcauth=user:passwordhash to the Conf::args.

tnull commented 1 year ago

Can second the wallet flakiness, we're seeing the same issue pop up in CI (cf https://github.com/lightningdevkit/rust-lightning/pull/2098#issuecomment-1466981268):

---- test_esplora_syncs stdout ----
thread 'test_esplora_syncs' panicked at 'called `Result::unwrap()` on an `Err` value: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', tests/integration_tests.rs:35:65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Not sure if this the same issue as OP though, and not sure what exactly is the source of this.

Kixunil commented 1 year ago

I think your races are unrelated to port binding.

RCasatta commented 1 year ago

Not sure if this the same issue as OP though, and not sure what exactly is the source of this.

For the record, this was most likely due to wrapping BitcoinD inside OnceCell causing the drop to not be called: https://github.com/lightningdevkit/rust-lightning/pull/2132