Closed anp closed 1 week ago
Am I missing something? It seems like the project runs tests under miri which I would have expected to highlight this issue, but maybe miri's execution model makes it hard to detect these kinds of races?
I think you're missing this: https://github.com/crossbeam-rs/crossbeam/blob/abf24e1f31a76ef2590688d0c2bb55f82c7410a9/ci/miri.sh#L35-L38
# -Zmiri-preemption-rate=0 is needed because this code technically has UB and Miri catches that.
So sometimes they do really have a data race on a MaybeUninit<T>
, and then they detect that a race happened somehow and discard the value before declaring it initialized?
I think you're missing this: https://github.com/crossbeam-rs/crossbeam/blob/abf24e1f31a76ef2590688d0c2bb55f82c7410a9/ci/miri.sh#L35-L38
Ah I read the CI script too quickly, thanks!
So sometimes they do really have a data race on a
MaybeUninit<T>
, and then they detect that a race happened somehow and discard the value before declaring it initialized?
Yes, that's my understanding.
What crossbeam will do about remove UB here has already been roughly decided more than two years ago, and will use asm-based byte-wise atomic memcpy (basically https://github.com/taiki-e/atomic-memcpy/pull/7) in the middle-term, and the corresponding standard library feature (probably RFC 3301) in the long-term. See also https://github.com/rust-lang/unsafe-code-guidelines/pull/448 and https://github.com/crossbeam-rs/crossbeam/pull/1015.
Thanks for the context! This came up regarding questions about how to handle tsan failures with crossbeam-deque, sounds like there's a plan in place and this has been decided to be UB.
This is not yet on our deliberate UB list, but it seems to be equivalent to the SeqLock problem?
It seems that
crossbeam_deque
knowingly executes data races for work stealing and tries to make it safe by performing volatile reads/writes and wrapping the returned type inMaybeUninit<T>
. Other code is responsible for checking whether the operation raced and thus whether to assume that theMaybeUninit<T>
is a valid value.There are some referenced papers behind the implementation, but I'm not sure how this could be sound under Rust's rules for loads and stores.
Am I missing something? It seems like the project runs tests under miri which I would have expected to highlight this issue, but maybe miri's execution model makes it hard to detect these kinds of races?