Open Kixunil opened 1 year ago
Do you have evidence of flaky tests?
Because ATM bitcoind
is tried to be started three times, every time with different ports asked to the OS so it should be pretty hard to hit the race condition.
But yes if bitcoind
get some way to communicate which random ports it is started with, like writing a file in datadir, it would be better.
It's just my guess that it could happen. I was looking at using this library but so far I went for extending a shell script which I already had. I'm still contemplating rewriting, though.
Do you have evidence of flaky tests?
šāāļø I just tried to upgrade our bitcoind
from version 22_0
to 23_0
and it introduced flakiness in our tests. We have two tests that spin up a BitcoinD
instance, with the result that sometimes the tests will both pass, sometimes they will both fail, and sometimes only 1 of the 2 will fail.
We have not observed any flakiness when using 22_0
, so it seems that there's something specific to 23_0
that causes these problems. I am not sure if it is related to the ports issue you guys are discussing; the error I get is SQLite trying to acquire a lock on the database in order to verify a wallet file:
failures:
---- runner::tests::integration::load_shutdown_runner stdout ----
thread 'runner::tests::integration::load_shutdown_runner' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14
---- runner::tests::integration::load_status_shutdown_node stdout ----
thread 'runner::tests::integration::load_status_shutdown_node' panicked at 'Failed to init bitcoind: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', /Users/fang/lexe/dev/lexe/public/common/src/test_utils/regtest.rs:44:14
failures:
runner::tests::integration::load_shutdown_runner
runner::tests::integration::load_status_shutdown_node
test result: FAILED. 30 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 17.45s
The tests use the default bitcoind::Conf
, only pushing an -rpcauth=user:passwordhash
to the Conf::args
.
Can second the wallet flakiness, we're seeing the same issue pop up in CI (cf https://github.com/lightningdevkit/rust-lightning/pull/2098#issuecomment-1466981268):
---- test_esplora_syncs stdout ----
thread 'test_esplora_syncs' panicked at 'called `Result::unwrap()` on an `Err` value: JsonRpc(Rpc(RpcError { code: -4, message: "Wallet file verification failed. SQLiteDatabase: Unable to obtain an exclusive lock on the database, is it being used by another instance of Bitcoin Core?\n", data: None }))', tests/integration_tests.rs:35:65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Not sure if this the same issue as OP though, and not sure what exactly is the source of this.
I think your races are unrelated to port binding.
Not sure if this the same issue as OP though, and not sure what exactly is the source of this.
For the record, this was most likely due to wrapping BitcoinD
inside OnceCell
causing the drop to not be called: https://github.com/lightningdevkit/rust-lightning/pull/2132
If my understanding is correct this crate launches a separate instance per test, which is nice but because of the race condition and tests being run in parallel this can cause flaky tests which is quite bad. It'd be great to fix the race completely.
I took a look at the obvious "what happens if I set
-port 0 -rpcport 0
" The answer is `bitcoind seems to attempt to bind the default port anyway (WTF?) and doesn't log the actual port so if we do this we have no way of figuring it out.I see only two solutions that include existing bitcoind instances:
LD_PRELOAD
a library that overridesbind()
to pass in port 0 and then reports the mapping over a pipe. We could read that report and learn the actual port number.Of course the long term solution is to add direct support to
bitcoind
itself.