risinglightdb / risinglight

An educational OLAP database system.
Apache License 2.0
1.6k stars 214 forks source link

storage: vacuum stopped unexpectedly #400

Closed skyzh closed 1 year ago

skyzh commented 2 years ago

This will happen at the end of the second import operation.

cargo run --release -- -f tests/sql/tpch/create.sql
cargo run --release -- -f tests/sql/tpch/import.sql
cargo run --release -- -f tests/sql/tpch/import.sql

I suspect we should wait for vacuum and compactor to complete before stopping the database instance. (aka. graceful stop)

2022-02-02T06:04:19.324325Z  INFO risinglight::storage::secondary::version_manager: vacuum 3_6
thread 'tokio-runtime-worker' panicked at 'vacuum stopped unexpectly: TracedStorageError { source: Io(Custom { kind: Other, error: "background task failed" }), backtrace: Backtrace [{ fn: "std::backtrace_rs::backtrace::libunwind::trace", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/../../backtrace/src/backtrace/libunwind.rs", line: 93 }, { fn: "std::backtrace_rs::backtrace::trace_unsynchronized", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/../../backtrace/src/backtrace/mod.rs", line: 66 }, { fn: "std::backtrace::Backtrace::create", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/backtrace.rs", line: 328 }, { fn: "<risinglight::storage::error::TracedStorageError as core::convert::From<risinglight::storage::error::StorageError>>::from" }, { fn: "<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll" }, { fn: "tokio::runtime::task::harness::poll_future" }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll" }, { fn: "std::thread::local::LocalKey<T>::with" }, { fn: "tokio::runtime::thread_pool::worker::Context::run_task" }, { fn: "tokio::runtime::thread_pool::worker::Context::run" }, { fn: "tokio::macros::scoped_tls::ScopedKey<T>::set" }, { fn: "tokio::runtime::thread_pool::worker::run" }, { fn: "tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut" }, { fn: "tokio::runtime::task::harness::Harness<T,S>::poll" }, { fn: "tokio::runtime::blocking::pool::Inner::run" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}" }, { fn: "<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/alloc/src/boxed.rs", line: 1854 }, { fn: "<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/alloc/src/boxed.rs", line: 1854 }, { fn: "std::sys::unix::thread::Thread::new::thread_start", file: "/rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/sys/unix/thread.rs", line: 108 }, { fn: "__pthread_deallocate" }] }', /Users/skyzh/Work/risinglight/src/storage/secondary/mod.rs:134:22
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:577:5
   1: core::panicking::panic_fmt
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/core/src/panicking.rs:110:14
   2: core::result::unwrap_failed
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/core/src/result.rs:1737:5
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   4: tokio::runtime::task::harness::poll_future
   5: tokio::runtime::task::harness::Harness<T,S>::poll
   6: std::thread::local::LocalKey<T>::with
   7: tokio::runtime::thread_pool::worker::Context::run_task
   8: tokio::runtime::thread_pool::worker::Context::run
   9: tokio::macros::scoped_tls::ScopedKey<T>::set
  10: tokio::runtime::thread_pool::worker::run
  11: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
  12: tokio::runtime::task::harness::Harness<T,S>::poll
  13: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
skyzh commented 2 years ago

What's more, if we run import for the third time...

stack backtrace:
   0: rust_begin_unwind
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:577:5
   1: core::panicking::panic_fmt
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/core/src/panicking.rs:110:14
   2: core::result::unwrap_failed
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/core/src/result.rs:1737:5
   3: risinglight::executor::copy_from_file::CopyFromFileExecutor::read_file_blocking
   4: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
   5: tokio::runtime::task::harness::Harness<T,S>::poll
   6: tokio::runtime::blocking::pool::Inner::run
Error: execute error: storage error: IO error: Too many open files (os error 24)

Caused by:
    0: storage error: IO error: Too many open files (os error 24)
    1: IO error: Too many open files (os error 24)
    2: IO error: Too many open files (os error 24)
    3: Too many open files (os error 24)

Stack backtrace:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/backtrace.rs:328:13
   3: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   4: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   5: risinglight::main::{{closure}}
   6: tokio::park::thread::CachedParkThread::block_on
   7: tokio::runtime::thread_pool::ThreadPool::block_on
   8: risinglight::main
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: std::rt::lang_start::{{closure}}
  11: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/core/src/ops/function.rs:259:13
  12: std::panicking::try::do_call
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:485:40
  13: std::panicking::try
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:449:19
  14: std::panic::catch_unwind
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panic.rs:136:14
  15: std::rt::lang_start_internal::{{closure}}
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/rt.rs:128:48
  16: std::panicking::try::do_call
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:485:40
  17: std::panicking::try
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panicking.rs:449:19
  18: std::panic::catch_unwind
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/panic.rs:136:14
  19: std::rt::lang_start_internal
             at /rustc/5e57faa78aa7661c6000204591558f6665f11abc/library/std/src/rt.rs:128:20
  20: _main

Which absolutely should not happen -- macOS has 128 open file limit, but it isn't possible that we opened 128 files at this point.

arkbriar commented 2 years ago

What if the compactor/vacuum is iterating over rowsets? There will be ~200 files in the data dir after the first import and will be doubled after the second. It's possible to exceed the open file limit.

I think we should limit the files opend by compactor and vacuum.

skyzh commented 2 years ago

Yes for sure... Also one RowSet has multiple columns, so maybe ~10 RowSets would exceed 128 limit.

unconsolable commented 1 year ago

Tried create.sql and 2x import.sql, but not meet this error, maybe this issue has been fixed?

skyzh commented 1 year ago

Thanks for update! Let’s close this issue for now.