tokio-rs / loom

Concurrency permutation testing tool for Rust.
MIT License
2.09k stars 110 forks source link

Loom panics inside of panic (unwraps None in Drop impl of Arc) #173

Closed Restioson closed 2 years ago

Restioson commented 4 years ago

While trying to use loom on a project of mine, I got a message like the following:

running 1 test
thread panicked while panicking. aborting.
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/restioson/IdeaProjects/loom_reproduce_sigill/target/release/deps/loom_reproduce_sigill-e63fb4bdfa431379` (signal: 4, SIGILL: illegal instruction)

Note on the SIGILL: rust-lang/rust#52633

This behaviour is exhibited across release, debug, nightly, and stable. I am running Linux, specifically Pop!_OS 20.04 LTS x86_64, with a kernel version of 5.4.0-7634-generic.

I attempted to debug it using cargo-with and gdb (cargo with "rust-gdb" -- test). I found that the most useful breakpoint to set was std::panicking::panic_count::increment, as the backtraces from here were often helpful.

Attempted reproduction: This revealed that the first panic was line 14 of the example, in loom_reproduce_sigill::DoSomething::do_something, when a try_lock result is unwrapped. The second panic was in loom::sync::RwLock::write, at line 83 of rwlock.rs. This comes from the following expect: self.data.try_write().expect("loom::RwLock state corrupt"). This is called from the drop impl of Droppable. I found a note in https://github.com/tokio-rs/loom/issues/115 that sometimes panics in drops can cause this to occur. I think this may be the same issue, but the drop-panic comes from loom itself. The backtraces of the panics are in a gist here and MVP and reproduction is available here: https://github.com/Restioson/loom_reproduce_sigill/blob/master/src/lib.rs.

In the project where this was discovered, the first panic was the assertion that fails when a deadlock is found, and the second was where loom called set_active with a None value from the schedule. This is obviously very different so I'm probably barking up the wrong tree with the reproduction honestly. The original panics are here which may be of more use. Additionally, the code for it is here. TL;DR I think the reproduction effort failed since the panic is different :(. I can try later to minimise the original while keeping the same panics if that'd be helpful.

It seems like the second panic in the original project is also from a Drop impl (loom's Arc), although this time it is quite a while away from it, inside of loom::rt::thread::Set::active_mut, which is passed None, which it then unwraps. This might be fixable or not, I have no idea. What's interesting here, though, is that if you uncomment the commented lines in tests/loom.rs, it doesn't SIGILL anymore and panics once only. I thought this might be due to the early drop of rx2, but if you remove rx2 it still SIGILLs. You can also remove the drop of tx and as far as I can tell, the second panic remains the same.

taiki-e commented 4 years ago

I think SIGILL is unrelated to the loom. Double panic during testing causes SIGILL regardless of whether using loom. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=83549ecf4e233cf421c55a3d8c322190

thread panicked while panicking. aborting.
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/playground/target/debug/deps/playground-a93cb10783372eac` (signal: 4, SIGILL: illegal instruction)
taiki-e commented 4 years ago

Maybe related: https://github.com/rust-lang/rust/issues/52633

Restioson commented 4 years ago

Thank you for taking a look! I think it being related to rust-lang/rust#52633 is highly likely - that is the exact panic message I got too. I will remove SIGILL from the title then. Otherwise, the double panic in the real-world example would still be an issue, right? I just realised that it depends on changes to dependencies which are not upstreamed yet, so I've changed the dependencies to use my forks, in case anyone wants to run it on their machines (I can also send more info if needed).