thomcc / cobb

Apache License 2.0
7 stars 0 forks source link

Loom comparisons #1

Open carllerche opened 3 years ago

carllerche commented 3 years ago

Hi!

I stumbled upon your project and it seems cool. I saw a comparison w/ loom and h somead questions:

  1. Much more accurate model of hardware reordering, as we literally just let your hardware do the reordering.
  2. In loom atomic ops come with implicit SeqCst barriers, which means that a lot of bugs can't be found.

I don't understand this. It is true that loom is intrusive and cannot find bugs that are outside of instrumented code. However, loom itself is not based on SeqCst. Loom re-implements the C++ memory model and tracks data causality (happens-before), modification, verifying modification-order correctness (etc).. It does a systematic exploration of the state space as well, so can explore permutations that would be very rare in practice (unless running on clusters of hundreds of servers for weeks...). All of this is why the code is non-trivial :).

Additionally, loom can verify more than tsan, which is another reason I wrote it. For example, tsan is unable to verify fences, which results in many false positives when using Arc.

Anyway, this isn't to say that a concurrency fuzz testing tool such as cobb isn't valuable. I would appreciate clarifying the loom comparison.

thomcc commented 3 years ago

Hmm, yes, those are fair concerns. I'll update when I get a chance.

thomcc commented 3 years ago

However, loom itself is not based on SeqCst

The issue around SeqCst, is that memory that isn't inside either a loom atomic or a loom UnsafeCell seemed to have changes propagated as if the atomic operations were SeqCst, due to some uses of SeqCst inside loom itself (I can't dig into loom code right now to find where).

For example this matters in cases where the data is immutable (and thus not relevant for UnsafeCell), but eventually gets freed (which should be considered a mutation). It also matters for stuff that follows Rust's &mut and & aliasing rules, and thus wouldn't require an UnsafeCell, but wouldn't be tracked by loom.

I'm happy to remove this either way though, I didn't spend a particularly long time thinking about the text I wrote here.

carllerche commented 3 years ago

In general, you are correct, anything not instrumented w/ loom will not be checked. That is a fair assessment.

I'm not following your example though. If data is immutable but should eventually get freed and unsafe isn't needed, where would the concurrency bug show up? Could you provide a code example?

I don't mind a comparison w/ loom :) a tool that helps facilitate testing w/ *san is useful and explaining the difference between that & loom can help.

thomcc commented 3 years ago

I don't mind a comparison w/ loom :) a tool that helps facilitate testing w/ *san is useful and explaining the difference between that & loom can help.

Yeah, I'll try to rewrite it to be better. I'll flag you on the PR to get your feedback. I probably can't get to this until the weekend though.

I'm not following your example though. If data is immutable but should eventually get freed and unsafe isn't needed, where would the concurrency bug show up? Could you provide a code example?

Unsafe is needed, to manage the memory in that case. I'll try to give a more simplified example in a bit (if you need — that said it will, again, probably won't be until the weekend), but you can imagine it happening with Arc here: https://github.com/rust-lang/rust/blob/d474075a8f28ae9a410e95d849d009006db4b176/library/alloc/src/sync.rs#L1556-L1568

Even without the Arc holding something with interior mutability, I believe you actually still need the Acquire fence there to prevent the writes from inside the allocator's dealloc implementation from becoming visible to other threads, as the release fence on the fetch_sub (clearly) isn't enough. While loom might catch this with the the Mutex<T> case (I haven't tried but am willing to believe it does), I believe it wouldn't in the dealloc case (and I have tried, and it doesn't, which makes sense).

That said, the main reason I wrote cobb is because it's a bit of a pain to rewrite the code to be compatible with loom (I even have a few pending PRs to loom to help a few cases here 😅). I had a similar issue with relacy in the past in C++ too.


¹ Perhaps a fix of this would be for loom to be more "allocator aware"... And plausibly that would be a good idea anyway — I'm surprised it's not, and unsure how you catch ABA issues in concurrent stacks and the like (something like https://github.com/thomcc/cobb/blob/main/examples/stack.rs), but I trust there's something I'm missing there.

Regardless, I think it would still exist for cases where Rust you have a &T or &mut T, but it's not quite as easy to come up with an example... but basically, if you have a bug elsewhere that incorrectly bounds writes, so that some code thinks it has &T or &mut T, when writes from other threads could still modify the data behind that reference (which would be UB, of course).

Loom will catch this if you put the data behind an UnsafeCell, of course, but you might not even need that for correctness (for example, the Arc case).

But yeah, I'm rambling now, but hopefully that makes sense.