Closed BaymaxHWY closed 2 years ago
Thanks for your report! This seems dup w/ https://github.com/risinglightdb/risinglight/issues/563. I'm still investigating it...
i try to debug this, seems to be because handler.0.take().unwrap().send(())
corresponding channel is early close?
https://github.com/risinglightdb/risinglight/blob/f60f2ad8b4a8588fa748d1715c85a8740e0941c7/src/storage/secondary/mod.rs#L139-L150
That might be one cause. What about changing all send(()).unwrap()
to send(()).ok()
? (For handler.1
, I still want unwrap
it).
sorry, I made a mistake, this panic on this.
https://github.com/risinglightdb/risinglight/blob/f60f2ad8b4a8588fa748d1715c85a8740e0941c7/src/storage/secondary/mod.rs#L145-L148
i try to change send(()).unwrap()
to send(()).ok()
in handler.0
, but it also panic on handler.1
Maybe this is not the root cause of panic.
The panic first happens at version_manager.rs:313:57, which caused the vacuum thread to stop working. After that, we can not send anything, and we can not join that future back.
I think it's better to investigate why https://github.com/risinglightdb/risinglight/issues/563 would happen.
aggre, i just try to set a breakpointer at version_manager.rs:313:57 and continue trying to see what happened, but it is not panic happen(pass all sqllogictest test). Is there a data race(but there's a lock:). By the way, how do you debug such asynchronous concurrent programs?
Ok, I found the problem.
When user call drop table
, as https://github.com/risinglightdb/risinglight/pull/555 had done, we will delete the RowSets.
However, if there are still background compaction going on, compaction will also delete those RowSets.
Therefore, we should wait for all compactors to complete their jobs before we could delete RowSets when dropping tables...
I'll propose a fix...
By the way, how do you debug such asynchronous concurrent programs?
I printed logs on calls to commit_changes
🤪
Digging into the issue further, I think we are missing a very important piece to make this really work -- our catalog is not MVCC, and is not managed by VersionManager
. This means that we can never get a consistent snapshot along with RowSets in our system... When we are applying changes from the compactor, we cannot know whether a table does exist solely from the snapshot in VersionManager.
I'd like to workaround the problem by simply ignore duplicated deleted RowSets. Note that even if we applied the workaround, the new RowSets produced by the compactor will still be there and won't be deleted.
When we have made our catalog fully MVCC, I'd go back to implement the full drop table support.
this is in macos, but in linux no this panic.
this backtrace in lldb