hawkw / sharded-slab

a lock-free concurrent slab (experimental)
MIT License
273 stars 19 forks source link

panic in Config RefCount::decr #55

Open rbtcollins opened 3 years ago

rbtcollins commented 3 years ago

This is from a unit test in rustup - I can push the branch up easily enough but basically: create several Arc\<pool>, add a Vec to one, downgrade, hand to a worker thread, panic in the main thread when the test fails.

backtrace ``` note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. thread 'diskio::test::test_complete_file_threaded' panicked at 'attempt to subtract with overflow', C:\Users\robertc\.cargo\registry\src\github.com-1ecc6299db9ec823\sharded-slab-0.1.1\src\page\slot.rs:718:26 stack backtrace: 0: 0x7ff6bd1c9c5e - std::backtrace_rs::backtrace::dbghelp::trace at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98 1: 0x7ff6bd1c9c5e - std::backtrace_rs::backtrace::trace_unsynchronized at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66 2: 0x7ff6bd1c9c5e - std::sys_common::backtrace::_print_fmt at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\sys_common\backtrace.rs:67 3: 0x7ff6bd1c9c5e - std::sys_common::backtrace::_print::{{impl}}::fmt at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\sys_common\backtrace.rs:46 4: 0x7ff6bd1e790b - core::fmt::write at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\core\src\fmt\mod.rs:1096 5: 0x7ff6bd1c3f88 - std::io::Write::write_fmt at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\io\mod.rs:1568 6: 0x7ff6bd1cd13d - std::sys_common::backtrace::_print at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\sys_common\backtrace.rs:49 7: 0x7ff6bd1cd13d - std::sys_common::backtrace::print at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\sys_common\backtrace.rs:36 8: 0x7ff6bd1cd13d - std::panicking::default_hook::{{closure}} at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\panicking.rs:208 9: 0x7ff6bd1ccc09 - std::panicking::default_hook at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\panicking.rs:225 10: 0x7ff6bcc35b9d - alloc::boxed::{{impl}}::call,Fn>,alloc::alloc::Global> at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\boxed.rs:1535 11: 0x7ff6bcb3f549 - rustup::currentprocess::with::{{closure}}::{{closure}}, anyhow::Error>> at C:\Users\robertc\Documents\src\rustup.rs\src\currentprocess.rs:145 12: 0x7ff6bd1cda32 - std::panicking::rust_panic_with_hook at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\panicking.rs:595 13: 0x7ff6bd1cd4f3 - std::panicking::begin_panic_handler::{{closure}} at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\panicking.rs:495 14: 0x7ff6bd1ca5bf - std::sys_common::backtrace::__rust_end_short_backtrace at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\sys_common\backtrace.rs:141 15: 0x7ff6bd1cd479 - std::panicking::begin_panic_handler at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\std\src\panicking.rs:493 16: 0x7ff6bd1e5ed0 - core::panicking::panic_fmt at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\core\src\panicking.rs:92 17: 0x7ff6bd1e5e1c - core::panicking::panic at /rustc/6a1835ad74247c069b0d24703c8267818487d7f5\/library\core\src\panicking.rs:50 18: 0x7ff6bcb396ed - sharded_slab::page::slot::RefCount::decr at C:\Users\robertc\.cargo\registry\src\github.com-1ecc6299db9ec823\sharded-slab-0.1.1\src\page\slot.rs:718 19: 0x7ff6bcb3aba0 - sharded_slab::page::slot::Slot, sharded_slab::cfg::DefaultConfig>::release,sharded_slab::cfg::DefaultConfig> at C:\Users\robertc\.cargo\registry\src\github.com-1ecc6299db9ec823\sharded-slab-0.1.1\src\page\slot.rs:506 20: 0x7ff6bcb3ad4d - sharded_slab::page::slot::Guard, sharded_slab::cfg::DefaultConfig>::release,sharded_slab::cfg::DefaultConfig> at C:\Users\robertc\.cargo\registry\src\github.com-1ecc6299db9ec823\sharded-slab-0.1.1\src\page\slot.rs:604 21: 0x7ff6bcbfde5c - sharded_slab::pool::{{impl}}::drop,sharded_slab::cfg::DefaultConfig> at C:\Users\robertc\.cargo\registry\src\github.com-1ecc6299db9ec823\sharded-slab-0.1.1\src\pool.rs:1131 22: 0x7ff6bcbf7fcf - core::ptr::drop_in_place, sharded_slab::cfg::DefaultConfig>> at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ptr\mod.rs:179 23: 0x7ff6bcbf654c - core::ptr::drop_in_place at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ptr\mod.rs:179 24: 0x7ff6bcbf46c3 - core::ptr::drop_in_place at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ptr\mod.rs:179 25: 0x7ff6bcbf4302 - core::ptr::drop_in_place at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ptr\mod.rs:179 26: 0x7ff6bcbf425a - core::ptr::drop_in_place at C:\Users\robertc\.rustup\toolchains\beta-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\ptr\mod.rs:179 27: 0x7ff6bcb50a18 - rustup::diskio::test::test_complete_file::{{closure}} at C:\Users\robertc\Documents\src\rustup.rs\src\diskio\test.rs:137 ```
rbtcollins commented 3 years ago

Note that I've only seen this once, and only when the test had failed :)

hawkw commented 3 years ago

Hmm, interesting. The panic is occurring here: https://github.com/hawkw/sharded-slab/blob/cf2537f3a5bd6ea21e9e48ecc22f05522ca7821e/src/page/slot.rs#L718 where we are trying to decrement the number of outstanding references to a slot, but the number of references is zero.

I wonder if this is a race in the ref-counting logic, or if there's some kind of bug where a slot has already been cleared while a reference to it still exists.

Since the tests are from rustup, I'm assuming they're open-source...can I take a look at the test that triggers this?

rbtcollins commented 3 years ago

This is close to the state I had the code when I triggered it - I had the test failing though, I'm going to see if I can recreate it for you. https://github.com/rbtcollins/rustup.rs/pull/new/sharded-slab-55

rbtcollins commented 3 years ago

I can't seem to reproduce. We create several pools without customising the config, so we will have multiple references to the config object; and then this was unwinding a panic : I don't recall if the failing test was failing in a worker thread and propogating, or in the main thread.

hawkw commented 3 years ago

We create several pools without customising the config, so we will have multiple references to the config object;

References to the config object shouldn't be an issue...this code runs when dropping a reference to an item in the pool...