risinglightdb / risinglight

An educational OLAP database system.
Apache License 2.0
1.61k stars 214 forks source link

error: "make test" panic #566

Closed BaymaxHWY closed 2 years ago

BaymaxHWY commented 2 years ago
image

this is in macos, but in linux no this panic.

this backtrace in lldb

error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
* thread #2, name = 'run_all_test_files', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000100d03201 sqllogictest-c1220b23d556cde8`std::panicking::rust_panic_with_hook::hb5cf2b942ebf7bb4 at panicking.rs:700:9 [opt]
    frame #1: 0x0000000100d02e9e sqllogictest-c1220b23d556cde8`std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h5d41b409a1cb79f6 at panicking.rs:581:13 [opt]
    frame #2: 0x0000000100d00727 sqllogictest-c1220b23d556cde8`std::sys_common::backtrace::__rust_end_short_backtrace::h6b84dfe55d649877 at backtrace.rs:139:18 [opt]
    frame #3: 0x0000000100d02b8a sqllogictest-c1220b23d556cde8`rust_begin_unwind at panicking.rs:577:5 [opt]
    frame #4: 0x0000000100d451df sqllogictest-c1220b23d556cde8`core::panicking::panic_fmt::h6e35c6dca7bd2e85 at panicking.rs:110:14 [opt]
    frame #5: 0x0000000100d452c5 sqllogictest-c1220b23d556cde8`core::result::unwrap_failed::hdde797a6a1745330 at result.rs:1737:5 [opt]
    frame #6: 0x0000000100647b78 sqllogictest-c1220b23d556cde8`core::result::Result$LT$T$C$E$GT$::unwrap::h1303342e706e9d0e(self=Err) at result.rs:1065:23
    frame #7: 0x00000001000c90ea sqllogictest-c1220b23d556cde8`risinglight::storage::secondary::SecondaryStorage::shutdown::_$u7b$$u7b$closure$u7d$$u7d$::hbeb355a5de0e2197((null)=ResumeTy @ 0x00007000020e4128) at mod.rs:147:9
    frame #8: 0x00000001000d0ec7 sqllogictest-c1220b23d556cde8`_$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h88b823443d35f800(self=Pin<&mut core::future::from_generator::GenFuture<risinglight::storage::secondary::{impl#0}::shutdown::{async_fn#0}>> @ 0x00007000020e44b8, cx=0x00007000020e4eb8) at mod.rs:84:19
    frame #9: 0x000000010008644d sqllogictest-c1220b23d556cde8`risinglight::db::Database::shutdown::_$u7b$$u7b$closure$u7d$$u7d$::h2d2008e83d5db015((null)=ResumeTy @ 0x00007000020e4940) at db.rs:53:31
    frame #10: 0x00000001000d1207 sqllogictest-c1220b23d556cde8`_$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::ha5ba8cbe3cb443b0(self=Pin<&mut core::future::from_generator::GenFuture<risinglight::db::{impl#0}::shutdown::{async_fn#0}>> @ 0x00007000020e4b38, cx=0x00007000020e4eb8) at mod.rs:84:19
    frame #11: 0x00000001000d4521 sqllogictest-c1220b23d556cde8`tokio::park::thread::CachedParkThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::h9c6e796c5b4f4614 at thread.rs:263:54
    frame #12: 0x000000010009e6a1 sqllogictest-c1220b23d556cde8`tokio::coop::with_budget::_$u7b$$u7b$closure$u7d$$u7d$::h0fe8fd8f5b9be95b(cell=0x0000000104808800) at coop.rs:102:9
    frame #13: 0x00000001000d574a sqllogictest-c1220b23d556cde8`std::thread::local::LocalKey$LT$T$GT$::try_with::hb9f0c8a079da8334(self=0x0000000100e6b7a0, f={closure#0} @ 0x00007000020e4df0) at local.rs:413:16
    frame #14: 0x00000001000d4c14 sqllogictest-c1220b23d556cde8`std::thread::local::LocalKey$LT$T$GT$::with::hedfe7991d0ee6530(self=0x0000000100e6b7a0, f=<unavailable>) at local.rs:389:9
    frame #15: 0x00000001000d3e43 sqllogictest-c1220b23d556cde8`tokio::park::thread::CachedParkThread::block_on::hb4298bb691d4ef9f [inlined] tokio::coop::with_budget::h3bfcea150bbc5a03(budget=(__0 = core::option::Option<> @ 0x00007000020e5158), f={closure#0} @ 0x00007000020e5160) at coop.rs:95:5
    frame #16: 0x00000001000d3e1d sqllogictest-c1220b23d556cde8`tokio::park::thread::CachedParkThread::block_on::hb4298bb691d4ef9f [inlined] tokio::coop::budget::h484828e80d54d2ec(f={closure#0} @ 0x00007000020e5120) at coop.rs:72:5
    frame #17: 0x00000001000d3dd0 sqllogictest-c1220b23d556cde8`tokio::park::thread::CachedParkThread::block_on::hb4298bb691d4ef9f(self=0x00007000020e51a8, f=(__0 = risinglight::db::{impl#0}::shutdown::{async_fn#0} @ 0x00007000020e51b0)) at thread.rs:263:31
    frame #18: 0x00000001000b0b82 sqllogictest-c1220b23d556cde8`tokio::runtime::enter::Enter::block_on::h8c0fd55ffdc235ba(self=0x00007000020e52a0, f=(__0 = risinglight::db::{impl#0}::shutdown::{async_fn#0} @ 0x00007000020e5320)) at enter.rs:151:13
    frame #19: 0x00000001000b9237 sqllogictest-c1220b23d556cde8`tokio::runtime::thread_pool::ThreadPool::block_on::h58dd729f03c21afc(self=0x00007000020e6168, future=(__0 = risinglight::db::{impl#0}::shutdown::{async_fn#0} @ 0x00007000020e54f0)) at mod.rs:73:9
    frame #20: 0x0000000100013bdf sqllogictest-c1220b23d556cde8`tokio::runtime::Runtime::block_on::hb8958b35853d2fb8(self=0x00007000020e6160, future=(__0 = risinglight::db::{impl#0}::shutdown::{async_fn#0} @ 0x00007000020e5650)) at mod.rs:477:43
    frame #21: 0x000000010002cf90 sqllogictest-c1220b23d556cde8`_$LT$sqllogictest..DatabaseWrapper$u20$as$u20$core..ops..drop..Drop$GT$::drop::h187de75571327771(self=0x00007000020e6160) at sqllogictest.rs:87:9
    frame #22: 0x000000010002ba05 sqllogictest-c1220b23d556cde8`core::ptr::drop_in_place$LT$sqllogictest..DatabaseWrapper$GT$::h96d0390da3865fb4((null)=0x00007000020e6160) at mod.rs:188:1
    frame #23: 0x000000010002c4c5 sqllogictest-c1220b23d556cde8`core::ptr::drop_in_place$LT$sqllogictest..runner..Runner$LT$sqllogictest..DatabaseWrapper$GT$$GT$::h278a9721a0fb598e((null)=0x00007000020e6160) at mod.rs:188:1
    frame #24: 0x0000000100057736 sqllogictest-c1220b23d556cde8`sqllogictest::test_disk::h2e49a58218138541(name=(data_ptr = "aggregation.slt", length = 15)) at sqllogictest.rs:60:1
    frame #25: 0x0000000100058024 sqllogictest-c1220b23d556cde8`sqllogictest::run_all_test_files::hd71520ec42fbbe75 at sqllogictest.rs:28:21
    frame #26: 0x0000000100057031 sqllogictest-c1220b23d556cde8`sqllogictest::run_all_test_files::_$u7b$$u7b$closure$u7d$$u7d$::h2249cbc037a4cce7((null)=0x00007000020e6980) at sqllogictest.rs:12:1
    frame #27: 0x0000000100023bf1 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once::h1451b5fde5d634c1((null)={closure#0} @ 0x00007000020e6980, (null)=<unavailable>) at function.rs:227:5
    frame #28: 0x0000000100161236 sqllogictest-c1220b23d556cde8`test::__rust_begin_short_backtrace::hb4c6402b861fcbfc [inlined] core::ops::function::FnOnce::call_once::h5c1a4237aa7c6187 at function.rs:227:5 [opt]
    frame #29: 0x0000000100161234 sqllogictest-c1220b23d556cde8`test::__rust_begin_short_backtrace::hb4c6402b861fcbfc at lib.rs:574:5 [opt]
    frame #30: 0x00000001001600a2 sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h517023a14c7c0439 at boxed.rs:1854:9 [opt]
    frame #31: 0x000000010016009a sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::hb2ed8711b502df5f at unwind_safe.rs:271:9 [opt]
    frame #32: 0x000000010016009a sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] std::panicking::try::do_call::hea20152223efc300 at panicking.rs:485:40 [opt]
    frame #33: 0x000000010016009a sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] std::panicking::try::h76c50381bd628208 at panicking.rs:449:19 [opt]
    frame #34: 0x000000010016009a sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] std::panic::catch_unwind::hb53e31601f900e57 at panic.rs:136:14 [opt]
    frame #35: 0x000000010016009a sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 [inlined] test::run_test_in_process::h5165cfdcce5a67be at lib.rs:597:18 [opt]
    frame #36: 0x0000000100160083 sqllogictest-c1220b23d556cde8`test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h2e618a46a315d0f3 at lib.rs:491:39 [opt]
    frame #37: 0x000000010016934a sqllogictest-c1220b23d556cde8`std::sys_common::backtrace::__rust_begin_short_backtrace::h9992f7bce788a185 [inlined] test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::h613c32f956ac8a8c at lib.rs:518:37 [opt]
    frame #38: 0x0000000100169277 sqllogictest-c1220b23d556cde8`std::sys_common::backtrace::__rust_begin_short_backtrace::h9992f7bce788a185 at backtrace.rs:123:18 [opt]
    frame #39: 0x000000010013647c sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h91df1bd7214faf3a at mod.rs:477:17 [opt]
    frame #40: 0x0000000100136477 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::he52ef6a74a543637 at unwind_safe.rs:271:9 [opt]
    frame #41: 0x0000000100136477 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] std::panicking::try::do_call::hddc808a9bfed8b0d at panicking.rs:485:40 [opt]
    frame #42: 0x0000000100136477 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] std::panicking::try::hd10d15fb8ed0052f at panicking.rs:449:19 [opt]
    frame #43: 0x0000000100136477 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] std::panic::catch_unwind::h129b76650fd65d22 at panic.rs:136:14 [opt]
    frame #44: 0x0000000100136477 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b [inlined] std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::h1a604cf954f80307 at mod.rs:476:30 [opt]
    frame #45: 0x0000000100136426 sqllogictest-c1220b23d556cde8`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0989aa2a5960072b at function.rs:227:5 [opt]
    frame #46: 0x0000000100d0bf27 sqllogictest-c1220b23d556cde8`std::sys::unix::thread::Thread::new::thread_start::h683fadd6bcd2ca78 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hee2214394afb1117 at boxed.rs:1854:9 [opt]
    frame #47: 0x0000000100d0bf21 sqllogictest-c1220b23d556cde8`std::sys::unix::thread::Thread::new::thread_start::h683fadd6bcd2ca78 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h07f084f7bbcc925e at boxed.rs:1854:9 [opt]
    frame #48: 0x0000000100d0bf1a sqllogictest-c1220b23d556cde8`std::sys::unix::thread::Thread::new::thread_start::h683fadd6bcd2ca78 at thread.rs:108:17 [opt]
    frame #49: 0x00007ff8170184f4 libsystem_pthread.dylib`_pthread_start + 125
    frame #50: 0x00007ff81701400f libsystem_pthread.dylib`thread_start + 15
skyzh commented 2 years ago

Thanks for your report! This seems dup w/ https://github.com/risinglightdb/risinglight/issues/563. I'm still investigating it...

BaymaxHWY commented 2 years ago

i try to debug this, seems to be because handler.0.take().unwrap().send(()) corresponding channel is early close? https://github.com/risinglightdb/risinglight/blob/f60f2ad8b4a8588fa748d1715c85a8740e0941c7/src/storage/secondary/mod.rs#L139-L150

skyzh commented 2 years ago

That might be one cause. What about changing all send(()).unwrap() to send(()).ok()? (For handler.1, I still want unwrap it).

BaymaxHWY commented 2 years ago

sorry, I made a mistake, this panic on this. https://github.com/risinglightdb/risinglight/blob/f60f2ad8b4a8588fa748d1715c85a8740e0941c7/src/storage/secondary/mod.rs#L145-L148 i try to change send(()).unwrap() to send(()).ok() in handler.0, but it also panic on handler.1

skyzh commented 2 years ago

Maybe this is not the root cause of panic.

The panic first happens at version_manager.rs:313:57, which caused the vacuum thread to stop working. After that, we can not send anything, and we can not join that future back.

I think it's better to investigate why https://github.com/risinglightdb/risinglight/issues/563 would happen.

BaymaxHWY commented 2 years ago

aggre, i just try to set a breakpointer at version_manager.rs:313:57 and continue trying to see what happened, but it is not panic happen(pass all sqllogictest test). Is there a data race(but there's a lock:). By the way, how do you debug such asynchronous concurrent programs?

skyzh commented 2 years ago

Ok, I found the problem.

When user call drop table, as https://github.com/risinglightdb/risinglight/pull/555 had done, we will delete the RowSets.

However, if there are still background compaction going on, compaction will also delete those RowSets.

Therefore, we should wait for all compactors to complete their jobs before we could delete RowSets when dropping tables...

skyzh commented 2 years ago

I'll propose a fix...

skyzh commented 2 years ago

By the way, how do you debug such asynchronous concurrent programs?

I printed logs on calls to commit_changes 🤪

skyzh commented 2 years ago

Digging into the issue further, I think we are missing a very important piece to make this really work -- our catalog is not MVCC, and is not managed by VersionManager. This means that we can never get a consistent snapshot along with RowSets in our system... When we are applying changes from the compactor, we cannot know whether a table does exist solely from the snapshot in VersionManager.

I'd like to workaround the problem by simply ignore duplicated deleted RowSets. Note that even if we applied the workaround, the new RowSets produced by the compactor will still be there and won't be deleted.

When we have made our catalog fully MVCC, I'd go back to implement the full drop table support.