Closed SamJBarney closed 3 months ago
Is it always those exact numbers, the len is 0 but the index is 0
? That backtrace looks like the raw table thinks it has an item, but the vector is empty.
Is there anything else "weird" going on during your tests, like panics? I'm guessing that we might get into a bad state if some user code like Hash
or Equivalent
panicked, and then you continued using the map elsewhere. I hope even then that it's not UB-bad though.
Is your test suite available for others to run to reproduce this problem?
Also, can your tests run under cargo miri
? And if so, does that report any problems?
I'm thinking it may have to do with me using Lazy to wrap the object that contains the indexmap. I'm gonna rework my code to not use it and see if that makes a difference. If it doesn't I'll make my test suite available so you can take a look at what's going on.
Currently on a windows box, so miri isn't available. I'll see if I can wrangle a linux or mac setup
I've reworked my code to not use once_cell::sync::Lazy and this has fixed the issue. So something about Lazy is causing inconsistencies with Indexmap. Sorry for the false failure!
Ok! I would still be interested in seeing the reproducer, if possible, to see if I can track down the problem in Lazy
-- especially in my @rust-lang/libs
role since that has also been ported to std::sync::LazyLock
.
Manged to get a branch back into the failure state: https://gitlab.com/open-craft/opencraft/-/tree/reproduction-branch?ref_type=heads
I believe your static mut
registries are primarily to blame, because DerefMut for Lazy
doesn't do any synchronization -- having &mut Lazy
already implies exclusive access by all Rust rules. Even after lazy initialization, any mutating access to your registries may be racing between threads, unless you have some other synchronization that I missed. This is what makes static mut
require unsafe
for all access.
For example, in setup_registry
(frame 26 in your backtrace), multiple tests may be running in parallel and call that at the same time. The first BLOCK_STATES.is_locked()
is an immutable Deref
, so Lazy
will do the right synchronization there to initialize it. However, then multiple tests may observe !is_locked()
and proceed to the mutating register
calls at the same time, before any of them get to the final lock()
.
std::sync::LazyLock
does not offer DerefMut
, which has some discussion in https://github.com/rust-lang/rust/issues/109736. But even if it did, the safety burden would still be on you for a static mut
. Alternatively, using static Mutex
or static RwLock
would be safe.
Ah, so the issue was that the tests were being run in parallel. Gotcha
Yes, but it is also a hazard for regular use of your API if it may be called from multiple threads.
Out of every 4 test runs I get the following error at least once: