scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.24k stars 137 forks source link

"no garbage row for <id>" flush failure loops #63

Closed scottlamb closed 5 years ago

scottlamb commented 5 years ago

When I set up multiple sample file dirs, I started getting errors like the one below in which:

This happens when there's no flush between Writer::collect_garbage completing and starting again. Its call to delete_garbage doesn't immediately remove it from garbage (not until the next flush). So the next call tries to unlink it again (which fails) and then calls delete_garbage on it again (which will eventually result in the transaction failing because the second delete reports the row doesn't exist).

collect_garbage happens in the post-flush notification, so there usually is a flush between each call. But the notification is asynchronous via a channel so this isn't actually guaranteed when other syncers can also fire flushes at a time it's not expecting.

The solution is pretty simple: make delete_garbage immediately remove it from garbage. Currently there's a garbage (everything in the garbage row) and a to_gc (things that should be removed from it on the next transaction). Instead, do a garbage_needs_unlink and a garbage_unlinked which are mutually exclusive.

Nov 28 14:57:26 ...: sync-/var/lib/moonfire-nvr/sample-6tb moonfire_db::writer] 6: deleting 30070712 bytes in 1 recordings (18927767 bytes needed)
Nov 28 14:57:50 ...: sync-/var/lib/moonfire-nvr/sample-6tb moonfire_db::db] Flush (why: 90 sec after start of 1 minute 5 seconds garage-main recording): added 5 recordings, deleted 1, marked 0 files GCed.
Nov 28 14:57:50 ...: sync-/var/lib/moonfire-nvr/sample-8tb moonfire_db::db] Flush (why: 90 sec after start of 1 minute 20 seconds courtyard-sub recording): added 0 recordings, deleted 0, marked 0 files GCed.
Nov 28 14:57:50 ...: sync-/var/lib/moonfire-nvr/sample-6tb moonfire_db::writer] dir: recording 6/622053 already deleted!
Nov 28 14:57:52 ...: sync-/var/lib/moonfire-nvr/sample-8tb moonfire_db::writer] 11: deleting 5348864 bytes in 1 recordings (4447102 bytes needed)
Nov 28 14:57:53 ...: sync-/var/lib/moonfire-nvr/sample-8tb moonfire_db::writer] flush failure on save for reason 90 sec after start of 1 minute 32 seconds driveway-sub recording; will retry after PT60S: ErrorMessage { msg: "no garbage row for 6/622053" }
Nov 28 14:57:53 ...: stack backtrace:
Nov 28 14:57:53 ...:    0:   0x81398f - backtrace::backtrace::libunwind::trace::h3efbb541b1fd4a89
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.9/src/backtrace/libunwind.rs:53
Nov 28 14:57:53 ...:                  - backtrace::backtrace::trace::he86533e63137b32d
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.9/src/backtrace/mod.rs:42
Nov 28 14:57:53 ...:    1:   0x812f1f - backtrace::capture::Backtrace::new_unresolved::h1e0ea6bd8272aa64
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.9/src/capture.rs:88
Nov 28 14:57:53 ...:    2:   0x812747 - failure::backtrace::internal::InternalBacktrace::new::hf30e7136ac39a801
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/backtrace/internal.rs:44
Nov 28 14:57:53 ...:    3:   0x812a2b - failure::backtrace::Backtrace::new::hbe74d198a344a1b6
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/backtrace/mod.rs:111
Nov 28 14:57:53 ...:                  - <failure::backtrace::Backtrace as core::default::Default>::default::h74085fe416c7ca1f
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/backtrace/mod.rs:125
Nov 28 14:57:53 ...:    4:   0x812a43 - failure::backtrace::Backtrace::new::hbe74d198a344a1b6
Nov 28 14:57:53 ...:    5:   0x5b918b - <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from::hf5461434cf9cbad6
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/error/error_impl.rs:19
Nov 28 14:57:53 ...:                  - <failure::error::Error as core::convert::From<F>>::from::h9373489268132011
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/error/mod.rs:36
Nov 28 14:57:53 ...:                  - failure::error_message::err_msg::ha9648b9e3acef653
Nov 28 14:57:53 ...:                 at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.3/src/error_message.rs:12
Nov 28 14:57:53 ...:                  - moonfire_db::raw::mark_sample_files_deleted::hd912400c9861c3a1
Nov 28 14:57:53 ...:                 at db/raw.rs:297
Nov 28 14:57:53 ...:    6:   0x4b829b - moonfire_db::db::LockedDatabase::flush::h3dc2c842b26129c2
Nov 28 14:57:53 ...:                 at /home/slamb/git/moonfire-nvr/db/db.rs:850
Nov 28 14:57:53 ...:    7:   0x509c67 - <moonfire_db::db::DatabaseGuard<'db, C>>::flush::h7b1c01d01aa74d02
Nov 28 14:57:53 ...:                 at /home/slamb/git/moonfire-nvr/db/db.rs:1883
Nov 28 14:57:53 ...:                  - <moonfire_db::writer::Syncer<C, D>>::flush::h3b5933ac754e19e8
Nov 28 14:57:53 ...:                 at /home/slamb/git/moonfire-nvr/db/writer.rs:454
Nov 28 14:57:53 ...:                  - <moonfire_db::writer::Syncer<C, D>>::run::h65b427485c2a355b
Nov 28 14:57:53 ...:                 at /home/slamb/git/moonfire-nvr/db/writer.rs:365
Nov 28 14:57:53 ...:    8:   0x4a32bf - moonfire_db::writer::start_syncer::{{closure}}::h8d8d9817454357db
Nov 28 14:57:53 ...:                 at /home/slamb/git/moonfire-nvr/db/writer.rs:147
Nov 28 14:57:53 ...:                  - std::sys_common::backtrace::__rust_begin_short_backtrace::h478cfad5bec097cb
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/sys_common/backtrace.rs:136
Nov 28 14:57:53 ...:    9:   0x47a023 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::h0bcfd3639b5817f7
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/thread/mod.rs:477
Nov 28 14:57:53 ...:                  - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hf8f01a225dda73a6
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/panic.rs:319
Nov 28 14:57:53 ...:                  - std::panicking::try::do_call::h6661548dfb451858
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/panicking.rs:310
Nov 28 14:57:53 ...:   10:   0x8e220b - __rust_maybe_catch_panic
Nov 28 14:57:53 ...:                 at src/libpanic_unwind/lib.rs:102
Nov 28 14:57:53 ...:   11:   0x44fd3b - std::panicking::try::h5b49cbb90fd9a1f2
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/panicking.rs:289
Nov 28 14:57:53 ...:                  - std::panic::catch_unwind::hdb03e368bff69de4
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/panic.rs:398
Nov 28 14:57:53 ...:                  - std::thread::Builder::spawn_unchecked::{{closure}}::h5c6ba7cb08b38b14
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/libstd/thread/mod.rs:476
Nov 28 14:57:53 ...:                  - <F as alloc::boxed::FnBox<A>>::call_box::h97a0e387b9befaf7
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/liballoc/boxed.rs:673
Nov 28 14:57:53 ...:   12:   0x8dbe6f - <alloc::boxed::Box<(dyn alloc::boxed::FnBox<A, Output=R> + 'a)> as core::ops::function::FnOnce<A>>::call_once::hd1734190ba15b47e
Nov 28 14:57:53 ...:                 at /rustc/400c2bc5ed292f77c49693320f4eda37bb375e90/src/liballoc/boxed.rs:683
Nov 28 14:57:53 ...:                  - std::sys_common::thread::start_thread::h30eac6e77e888779
Nov 28 14:57:53 ...:                 at src/libstd/sys_common/thread.rs:24
Nov 28 14:57:53 ...:                  - std::sys::unix::thread::Thread::new::thread_start::h6b3c4b2ce6075f65
Nov 28 14:57:53 ...:                 at src/libstd/sys/unix/thread.rs:90