rust-lang / miri

An interpreter for Rust's mid-level intermediate representation
Apache License 2.0
4.16k stars 318 forks source link

Miri accepts an open-coded compare-exchange loop but rejects the equivalent code using fetch_update #3613

Closed adamreichold closed 1 month ago

adamreichold commented 1 month ago

I have a little[^1] lock-free data structure I would like to use in PyO3 and I tested it using Miri (with the default settings) to increase my confidence that it actually works.

One thing I found during testing is that replacing an open-coded compare-exchange loop by the equivalent code using fetch_update in

https://github.com/adamreichold/lockfree-collector/commit/20443504d004f27f01d605ce69cd5b31fd9e9934

is rejected by Miri, c.f.

https://github.com/adamreichold/lockfree-collector/actions/runs/9141771033/job/25136576921

Is it possible the closure used by fetch_update confuses Miri or does it indeed add additional requirements that the unsafe code violates? Or did I make a mistake and the code is not in fact equivalent?

[^1]: famous last words

saethlin commented 1 month ago

Yes the closure adds additional requirements.

The problem here is field retagging. If you set MIRIFLAGS=-Zmiri-retag-fields=none you can see that the UB goes away. I am pretty sure I mentioned this sort of problem a while ago in a conversation about field retagging (I can't find that conversation now or I'd link it); in short people tend to add function calls to code pretty freely because they are a zero-cost abstraction, but if field retagging exists function calls can add surprise requirements to unsafe code.

I think the problem here is that the closure captures a &mut, then the closure is passed to the function, so with field retagging the fetch_update call creates a protector for the full duration of that fetch_update call. There is no such protector in the compare_exchange_weak version.

RalfJung commented 1 month ago

As an side, our handling of tags that refer to arguments seems to fall apart when the call is a closure.

RalfJung commented 1 month ago

So -- the hypothesis is that after the CAS succeeds but before fetch_update returns, last_next is still protected as it was an argument to fetch_update, and so if now another thread invalidates last_next then that is UB?

That sounds plausible. The easiest work-around then is to make last_next a raw pointer. Currently it is a reference and the lifetime of that reference is just a bit too long for this code to be sound.

adamreichold commented 1 month ago

Thank you both for looking into this!

That sounds plausible. The easiest work-around then is to make last_next a raw pointer. Currently it is a reference and the lifetime of that reference is just a bit too long for this code to be sound.

Indeed, using a raw pointer and a short lived reference only inside the closure like

https://github.com/adamreichold/lockfree-collector/commit/9fd7738b3d49f08024989a780335b17376c41e34

does resolve the issue

https://github.com/adamreichold/lockfree-collector/actions/runs/9148024927/job/25150080367

So if I understand you correctly, there is no bug here and maybe a bit of a sharp edge when using fetch_update and this issue should be closed?

I think for the collector, I'll stick to the open-coded loop as fetch_update is more general than necessary here in any case.

RalfJung commented 1 month ago

Thanks for trying that!

So if I understand you correctly, there is no bug here and maybe a bit of a sharp edge when using fetch_update and this issue should be closed?

Yeah, Miri is working as intended, so I will close the issue.

There's a t-opsem question here whether maybe closure fields should be wrapped in MaybeDangling to avoid this foot-gun, but that's a separate discussion.