notify-rs / notify

🔭 Cross-platform filesystem notification library for Rust.
https://docs.rs/notify
2.59k stars 206 forks source link

Thread watching debounced events hanging #205

Open itkovian opened 5 years ago

itkovian commented 5 years ago

System details

I use notify in https://github.com/itkovian/sarchive/. The issue I opened for this is at https://github.com/itkovian/sarchive/issues/20.

What you did (as detailed as you can)

What you expected

It should just be able to keep track of all events, especially since the queue is big enough.

What happened

The watcher thread and the debouncer (Threads 3 and 2 below) event loop are waiting and seem to be deadlocked. There is no further processing of events.

Thread 6 (Thread 0x7f545580a700 (LWP 4027)):
#0  0x00007f5455df9965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000055713a5ccd0d in crossbeam_utils::sync::parker::Inner::park::h6f7e9f05947e58c5 ()
#2  0x000055713a5cc9cb in crossbeam_utils::sync::parker::Parker::park::h1dacaa9c32ead107 ()
#3  0x000055713a553f5b in sarchive::lib::signal_handler_atomic::h2d8e9d3d751c5d1f ()
#4  0x000055713a5546f6 in crossbeam_utils::thread::ScopedThreadBuilder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::h86eeb285f945e829 ()
#5  0x000055713a5462eb in std::sys_common::backtrace::__rust_begin_short_backtrace::hdc70f0854b567d1a ()
#6  0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#7  0x000055713a54fe6d in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::h9a5ade7f14d85759 ()
#8  0x000055713a63e09e in call_once<(),()> () at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/boxed.rs:759
#9  start_thread () at src/libstd/sys_common/thread.rs:14
#10 thread_start () at src/libstd/sys/unix/thread.rs:81
#11 0x00007f5455df5dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5455908ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5455609700 (LWP 4028)):
#0  0x00007f5455df9965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000055713a62f353 in wait () at src/libstd/sys/unix/condvar.rs:70
#2  wait () at src/libstd/sys_common/condvar.rs:41
#3  wait<()> () at src/libstd/sync/condvar.rs:204
#4  park () at src/libstd/thread/mod.rs:908
#5  0x000055713a5cc4f3 in crossbeam_channel::context::Context::wait_until::h0bf53e5a4352da39 ()
#6  0x000055713a5cc1cc in crossbeam_channel::context::Context::with::_$u7b$$u7b$closure$u7d$$u7d$::hb09c5663e8cedc6d ()
#7  0x000055713a5cbcaf in _$LT$std..thread..local..LocalKey$LT$T$GT$$GT$::try_with::h2081487015026962 ()
#8  0x000055713a5caf21 in crossbeam_channel::select::run_select::he3afd68d128e33fc ()
#9  0x000055713a5cb706 in crossbeam_channel::select::Select::select::hced2ab4085c02c35 ()
#10 0x000055713a551776 in sarchive::lib::monitor::h8503efb39a9206f2 ()
#11 0x000055713a554365 in crossbeam_utils::thread::ScopedThreadBuilder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::h5c3cfc13ba01db7e ()
#12 0x000055713a5462eb in std::sys_common::backtrace::__rust_begin_short_backtrace::hdc70f0854b567d1a ()
#13 0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#14 0x000055713a54fb2d in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::h190f113dc2fb2a1b ()
#15 0x000055713a63e09e in call_once<(),()> () at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/boxed.rs:759
#16 start_thread () at src/libstd/sys_common/thread.rs:14
#17 thread_start () at src/libstd/sys/unix/thread.rs:81
#18 0x00007f5455df5dd5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f5455908ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5455408700 (LWP 4029)):
#0  0x00007f5455df9965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000055713a62f353 in wait () at src/libstd/sys/unix/condvar.rs:70
#2  wait () at src/libstd/sys_common/condvar.rs:41
#3  wait<()> () at src/libstd/sync/condvar.rs:204
#4  park () at src/libstd/thread/mod.rs:908
#5  0x000055713a5cc4f3 in crossbeam_channel::context::Context::wait_until::h0bf53e5a4352da39 ()
#6  0x000055713a5cc1cc in crossbeam_channel::context::Context::with::_$u7b$$u7b$closure$u7d$$u7d$::hb09c5663e8cedc6d ()
#7  0x000055713a5cbcaf in _$LT$std..thread..local..LocalKey$LT$T$GT$$GT$::try_with::h2081487015026962 ()
#8  0x000055713a5caf21 in crossbeam_channel::select::run_select::he3afd68d128e33fc ()
#9  0x000055713a5cb706 in crossbeam_channel::select::Select::select::hced2ab4085c02c35 ()
#10 0x000055713a5529c8 in sarchive::lib::process::h005bbd03ae18b9b3 ()
#11 0x000055713a55415d in crossbeam_utils::thread::ScopedThreadBuilder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::h1c9ff6e400f4e454 ()
#12 0x000055713a5462eb in std::sys_common::backtrace::__rust_begin_short_backtrace::hdc70f0854b567d1a ()
#13 0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#14 0x000055713a55000d in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hae33da4cd92a3d73 ()
#15 0x000055713a63e09e in call_once<(),()> () at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/boxed.rs:759
#16 start_thread () at src/libstd/sys_common/thread.rs:14
#17 thread_start () at src/libstd/sys/unix/thread.rs:81
#18 0x00007f5455df5dd5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f5455908ead in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5455207700 (LWP 4030)):
#0  0x00007f5455dfc4ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f5455df7dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f5455df7c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000055713a5811f3 in notify::debounce::timer::ScheduleWorker::run::h05a242d775f47422 ()
#4  0x000055713a5946ff in std::sys_common::backtrace::__rust_begin_short_backtrace::he553ad6571cfadd7 ()
#5  0x000055713a597c1e in std::panicking::try::do_call::h816d9a307bc6821f ()
#6  0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#7  0x000055713a59fc2f in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::h2b7c3a08953572d0 ()
#8  0x000055713a63e09e in call_once<(),()> () at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/boxed.rs:759
#9  start_thread () at src/libstd/sys_common/thread.rs:14
#10 thread_start () at src/libstd/sys/unix/thread.rs:81
---Type <return> to continue, or q <return> to quit---
#11 0x00007f5455df5dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f5455908ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5455006700 (LWP 4031)):
#0  0x00007f5455dfc4ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f5455df7dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f5455df7c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000055713a5829df in notify::debounce::timer::WatchTimer::ignore::h41e8c48798833732 ()
#4  0x000055713a589b0d in notify::debounce::Debounce::event::h457eb1b9afa24ac7 ()
#5  0x000055713a586f72 in notify::debounce::EventTx::send::h710e47f0ce4ad6c5 ()
#6  0x000055713a577568 in notify::inotify::EventLoop::event_loop_thread::h77a1f3739e8f1912 ()
#7  0x000055713a5946c6 in std::sys_common::backtrace::__rust_begin_short_backtrace::h5d4573f5b62c565e ()
#8  0x000055713a597c56 in std::panicking::try::do_call::h9da2af6e0d2c9e8c ()
#9  0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#10 0x000055713a59ffa8 in _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::heaa12d6560ac3ec4 ()
#11 0x000055713a63e09e in call_once<(),()> () at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/boxed.rs:759
#12 start_thread () at src/libstd/sys_common/thread.rs:14
#13 thread_start () at src/libstd/sys/unix/thread.rs:81
#14 0x00007f5455df5dd5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f5455908ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f5456837940 (LWP 4026)):
#0  0x00007f5455df9965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000055713a5cd670 in crossbeam_utils::sync::wait_group::WaitGroup::wait::h99903d437b2ae8f1 ()
#2  0x000055713a55630b in crossbeam_utils::thread::scope::h4e5e6b620eff23ee ()
#3  0x000055713a54d761 in sarchive::main::hcf3a3ed84010428d ()
#4  0x000055713a54dbe3 in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h6aee53ed84e23fb3 ()
#5  0x000055713a63a263 in {{closure}} () at src/libstd/rt.rs:49
#6  do_call<closure,i32> () at src/libstd/panicking.rs:297
#7  0x000055713a63ea0a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:87
#8  0x000055713a63ae2d in try<i32,closure> () at src/libstd/panicking.rs:276
#9  catch_unwind<closure,i32> () at src/libstd/panic.rs:388
#10 lang_start_internal () at src/libstd/rt.rs:48
#11 0x000055713a54dbb2 in main ()

Relevant code:

pub fn monitor(path: &Path, s: &Sender<TorqueJobEntry>, sigchannel: &Receiver<bool>) -> notify::Result<()> {
    let (tx, rx) = unbounded();

    // create a platform-specific watcher
    let mut watcher: RecommendedWatcher = Watcher::new(tx, Duration::from_secs(2))?;

    info!("Watching path {:?}", &path);

    if let Err(e) = watcher.watch(&path, RecursiveMode::NonRecursive) {
        return Err(e);
    }
    loop {
        select! {
            recv(sigchannel) -> b => if let Ok(true) = b  {
                return Ok(());
            },
            recv(rx) -> event => { match event {
                Ok(e) => check_and_queue(s, e.unwrap())?,
                Err(e) => {
                    error!("Error on received event: {:?}", e);
                    break;
                }
            };}
        }
    }

    Ok(())
}
passcod commented 5 years ago

Can you use or test the latest 4.x release instead of what's in main? At least to figure out if this is new or existing

itkovian commented 5 years ago

I can confirm that the issue also exists in 4.0.12. I tried forcing to 4.0.10, but I do not know why it keeps picking 4.0.12.

itkovian commented 5 years ago

I tried changing the delay, but that does not fix anything. Raising the delay to 10s, gives me the highest ever count of processed events (5055), but then it also hangs.

passcod commented 5 years ago

As an aside, just to confirm: when I read your downstream issue you were going with RawEvents anyway, right? So this is bothersome (and I'd like to get it fixed!) but not preventing you from your task, if I understand right?

On Sun, 16 Jun 2019, 08:29 Andy Georges, notifications@github.com wrote:

I tried changing the delay, but that does not fix anything. Raising the delay to 10s, gives me the highest ever count of processed events (5055), but then it also hangs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/205?email_source=notifications&email_token=AABGBC3JLGFOC4WUPLIZ25LP2VGJ5A5CNFSM4HYGOTGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXY7R7I#issuecomment-502397181, or mute the thread https://github.com/notifications/unsubscribe-auth/AABGBC4GJZ6EO6MSAOGHYNTP2VGJ5ANCNFSM4HYGOTGA .

passcod commented 5 years ago

You may need to specify "=4.0.10" to pin to a specific version (and apologies if you've tried that already)

itkovian commented 5 years ago

I'd rather not go with RawEvents for the following reason. In one version of the tool, I'm looking to archive files that reside in a directory. Upon receipt of the CREATE (for the dir), I have no idea in what state the files are in. I tried to recursively watch, but that seemed to miss CREATE events on the files, though I have no immediate idea why. The debounced events will only appear after the delay, if I understood correctly, and then I can be reasonably sure that if the files exist, their contents is also complete. I'll try to pinpoint an earlier version where it still fails later today.

passcod commented 5 years ago

Notify on linux doesn't currently issue creates for files inside newly-watched directories when doing a recursive descent. I think there's an issue for that.

On Sun, 16 Jun 2019, 19:11 Andy Georges, notifications@github.com wrote:

I'd rather not go with RawEvents for the following reason. In one version of the tool, I'm looking to archive files that reside in a directory. Upon receipt of the CREATE (for the dir), I have no idea in what state the files are in. I tried to recursively watch, but that seemed to miss CREATE events on the files, though I have no immediate idea why. The debounced events will only appear after the delay, if I understood correctly, and then I can be reasonably sure that if the files exist, their contents is also complete. I'll try to pinpoint an earlier version where it still fails later today.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/passcod/notify/issues/205?email_source=notifications&email_token=AABGBC7YZI3VAYE3JM2RFHLP2XRSBA5CNFSM4HYGOTGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZG6QI#issuecomment-502427457, or mute the thread https://github.com/notifications/unsubscribe-auth/AABGBC77IETA5AGF7P6EJ4TP2XRSBANCNFSM4HYGOTGA .

itkovian commented 5 years ago

Would it issue e.g., CLOSE_WRITE? That would also by ok ;) Or is the problem the underlying inotify implementation in the kernel itself?

passcod commented 5 years ago

If files are created before the watch is placed, they won't emit events. What notify could do is readdir a directory it notices is new / gets a create event for and issue creates for everything it finds inside, to simulate what people expect would be the correct behaviour.

passcod commented 5 years ago

(That is, the inotify watch depth is one. Given a directory, it will only watch itself and its immediate contents. Recursive watching on Linux is a Notify abstraction on top.)

0xpr03 commented 4 years ago

I'm not sure but I think I've got the same problem, using 5.0.0-pre.1. I've got foo/ which I watch. I add foo/A/AA/AA.txt and can delete A/ and add it again, works. But when I insert A/AA/AA.txt inside foo/A, creating foo/A/A/AA/AA.txt it gets to hang. So it seems like the update of the recursive inotify? creates a total hangup. After this no event is received ever again.

0xpr03 commented 4 years ago

I switched to 4.0.13 and get the same behavior.

braincore commented 4 years ago

I've also observed this behavior for many months on 4.x.x and finally got to verifying that it only presents itself with the debouncer. I've switched to raw_watcher to get around this.

mtak- commented 4 years ago

Version: 4.0.13

I have discovered this same issue on macos (fsevents). It manifests semi-frequently when using the debouncer with a low delay (e.g. 0).

  thread #4
    frame #0: 0x00007fff6df2af4a libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff6dfe9a4a libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 96
    frame #2: 0x00007fff6dfe7769 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 222
    frame #3: 0x000000010a7c23e5 test-desktop`std::sync::mutex::Mutex$LT$T$GT$::lock::h85454f619cabf454(self=0x00007faea6728190) at mutex.rs:220:12 [opt]
    frame #4: 0x000000010a7c9772 test-desktop`notify::debounce::timer::LoggingMutex$LT$T$GT$::lock::hc385c4f438d70e45(self=0x00007faea6728190) at timer.rs:24:8 [opt]
    frame #5: 0x000000010a79f831 test-desktop`notify::debounce::timer::ScheduleWorker::fire_event::h7e0fa0746004f739(self=0x0000700006876bc0, ev=<unavailable>) at timer.rs:98:36 [opt]
    frame #6: 0x000000010a79f70e test-desktop`notify::debounce::timer::ScheduleWorker::fire_due_events::h68f460ada7f4030b(self=0x0000700006876bc0, now=<unavailable>) at timer.rs:85:16 [opt]
    frame #7: 0x000000010a79fc75 test-desktop`notify::debounce::timer::ScheduleWorker::run::h90d88ef0b0efc6b1(self=0x0000700006876bc0) at timer.rs:138:28 [opt]
    frame #8: 0x000000010a7c8e90 test-desktop`notify::debounce::timer::WatchTimer::new::_$u7b$$u7b$closure$u7d$$u7d$::hcb8abd07577d45f9 at timer.rs:186:12 [opt]
    frame #9: 0x000000010a7c8237 test-desktop`std::sys_common::backtrace::__rust_begin_short_backtrace::hbe01058a8f40b652(f=<unavailable>) at backtrace.rs:77:4 [opt]
    frame #10: 0x000000010a79cfe7 test-desktop`std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h0dbd0c1b33aa8f6a at mod.rs:470:16 [opt]
    frame #11: 0x000000010a79cf27 test-desktop`_$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h3145a96354848384(self=<unavailable>, _args=<unavailable>) at panic.rs:315:8 [opt]
    frame #12: 0x000000010a7c9017 test-desktop`std::panicking::try::do_call::hf5ead24062a0b84a(data=<unavailable>) at panicking.rs:296:39 [opt]
    frame #13: 0x000000010a8cf47f test-desktop`__rust_maybe_catch_panic at lib.rs:80:7 [opt]
    frame #14: 0x000000010a7c8f06 test-desktop`std::panicking::try::h1a92081908bf3b80(f=<unavailable>) at panicking.rs:275:12 [opt]
    frame #15: 0x000000010a79cf87 test-desktop`std::panic::catch_unwind::h362c35e8f3a05ee7(f=<unavailable>) at panic.rs:394:8 [opt]
    frame #16: 0x000000010a7afb31 test-desktop`std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::hcc81242fee004cf7 at mod.rs:469:29 [opt]
    frame #17: 0x000000010a8bc77e test-desktop`_$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hc8e54c901e83587c at boxed.rs:922:8 [opt]
    frame #18: 0x000000010a8ced0e test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 [inlined] _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hefb3e6f10e6b13a2 at boxed.rs:922:8 [opt]
    frame #19: 0x000000010a8ced02 test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 [inlined] std::sys_common::thread::start_thread::h588c9bae9dd3729b at thread.rs:13 [opt]
    frame #20: 0x000000010a8cec8e test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 at thread.rs:79 [opt]
    frame #21: 0x00007fff6dfebd76 libsystem_pthread.dylib`_pthread_start + 125
    frame #22: 0x00007fff6dfe85e3 libsystem_pthread.dylib`thread_start + 15
  thread #18
    frame #0: 0x00007fff6df2af4a libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff6dfe9a4a libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 96
    frame #2: 0x00007fff6dfe7769 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 222
    frame #3: 0x000000010a7c2385 test-desktop`std::sync::mutex::Mutex$LT$T$GT$::lock::h13745d6d73d39497(self=0x00007faea6731380) at mutex.rs:220:12 [opt]
    frame #4: 0x000000010a7c9692 test-desktop`notify::debounce::timer::LoggingMutex$LT$T$GT$::lock::h7d5e14a913f8e32b(self=0x00007faea6731380) at timer.rs:24:8 [opt]
    frame #5: 0x000000010a7a00a5 test-desktop`notify::debounce::timer::WatchTimer::ignore::hfa4765c6fbd481ee(self=<unavailable>, id=<unavailable>) at timer.rs:222:25 [opt]
    frame #6: 0x000000010a7b6168 test-desktop`notify::debounce::restart_timer::hf02fd7b09cfbb09d(timer_id=0x00007faea8b06340, path=PathBuf @ 0x0000700007c97820, timer=0x00007faea6736f48) at mod.rs:506:8 [opt]
    frame #7: 0x000000010a7b5c5a test-desktop`notify::debounce::Debounce::event::h304ff5a9c2a8fbb4(self=0x00007faea6736f18, path=PathBuf @ 0x0000700007c978e0, op=<unavailable>, cookie=<unavailable>) at mod.rs:465:32 [opt]
    frame #8: 0x000000010a7b474d test-desktop`notify::debounce::EventTx::send::h979500e8b59b8377(self=<unavailable>, event=<unavailable>) at mod.rs:46:24 [opt]
    frame #9: 0x000000010a7ae477 test-desktop`notify::fsevent::send_pending_rename_event::hf8882bfb28062e1b(event=Option<notify::RawEvent> @ 0x0000700007c97b40, event_tx=<unavailable>) at fsevent.rs:76:8 [opt]
    frame #10: 0x000000010a7aeec9 test-desktop`notify::fsevent::callback::h6f970d2ae789d1a1(stream_ref=<unavailable>, info=<unavailable>, num_events=12, event_paths=<unavailable>, event_flags=<unavailable>, event_ids=<unavailable>) at fsevent.rs:348:8 [opt]
    frame #11: 0x00007fff382098eb FSEvents`implementation_callback_rpc + 2911
    frame #12: 0x00007fff38208d13 FSEvents`_Xcallback_rpc + 233
    frame #13: 0x00007fff38208c0d FSEvents`FSEventsD2F_server + 55
    frame #14: 0x00007fff3820b68d FSEvents`FSEventsClientProcessMessageCallback + 46
    frame #15: 0x00007fff36a35f73 CoreFoundation`__CFMachPortPerform + 250
    frame #16: 0x00007fff36a35e6f CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 41
    frame #17: 0x00007fff36a35dbf CoreFoundation`__CFRunLoopDoSource1 + 541
    frame #18: 0x00007fff36a1e0dc CoreFoundation`__CFRunLoopRun + 2612
    frame #19: 0x00007fff36a1d423 CoreFoundation`CFRunLoopRunSpecific + 499
    frame #20: 0x00007fff36a1d1fa CoreFoundation`CFRunLoopRun + 40
    frame #21: 0x000000010a7c8e01 test-desktop`notify::fsevent::FsEventWatcher::run::_$u7b$$u7b$closure$u7d$$u7d$::hb7e959d675ad69ca at fsevent.rs:240:16 [opt]
    frame #22: 0x000000010a7c8265 test-desktop`std::sys_common::backtrace::__rust_begin_short_backtrace::hd22343935848eab5(f=<unavailable>) at backtrace.rs:77:4 [opt]
    frame #23: 0x000000010a79d015 test-desktop`std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h30b537835a8eb003 at mod.rs:470:16 [opt]
    frame #24: 0x000000010a79cf55 test-desktop`_$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h984237bcc127aa0f(self=<unavailable>, _args=<unavailable>) at panic.rs:315:8 [opt]
    frame #25: 0x000000010a7c8fd2 test-desktop`std::panicking::try::do_call::he72e0574bf681862(data=<unavailable>) at panicking.rs:296:39 [opt]
    frame #26: 0x000000010a8cf47f test-desktop`__rust_maybe_catch_panic at lib.rs:80:7 [opt]
    frame #27: 0x000000010a7c8f74 test-desktop`std::panicking::try::h2b3f0e92b094ef5d(f=<unavailable>) at panicking.rs:275:12 [opt]
    frame #28: 0x000000010a79cfb5 test-desktop`std::panic::catch_unwind::h7a7b604217f078c6(f=<unavailable>) at panic.rs:394:8 [opt]
    frame #29: 0x000000010a7afa3e test-desktop`std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::hc7744707bf6d169a at mod.rs:469:29 [opt]
    frame #30: 0x000000010a8bc77e test-desktop`_$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hc8e54c901e83587c at boxed.rs:922:8 [opt]
    frame #31: 0x000000010a8ced0e test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 [inlined] _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hefb3e6f10e6b13a2 at boxed.rs:922:8 [opt]
    frame #32: 0x000000010a8ced02 test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 [inlined] std::sys_common::thread::start_thread::h588c9bae9dd3729b at thread.rs:13 [opt]
    frame #33: 0x000000010a8cec8e test-desktop`std::sys::unix::thread::Thread::new::thread_start::h52cdb32292a4bc87 at thread.rs:79 [opt]
    frame #34: 0x00007fff6dfebd76 libsystem_pthread.dylib`_pthread_start + 125
    frame #35: 0x00007fff6dfe85e3 libsystem_pthread.dylib`thread_start + 15

I added some logging to confirm the deadlock:

ThreadId(18) about to lock: 0x7faea6728190
// ThreadId(18) about to lock: 0x7faea6731380
// ThreadId(18) about to unlock: 0x7faea6731380 (these two cancel out)
ThreadId(3)  about to lock: 0x7faea6731380
ThreadId(3)  about to lock: 0x7faea6728190
ThreadId(18) about to lock: 0x7faea6731380

Diving deeper, the operations buffer lock, and the events lock are held simultaneously, but grabbed in different orders from the two threads.

These lines from thread 4 (id 3):

// attempts to grab operations buffer lock
    frame #5: 0x000000010a79f831 test-desktop`notify::debounce::timer::ScheduleWorker::fire_event::h7e0fa0746004f739(self=0x0000700006876bc0, ev=<unavailable>) at timer.rs:98:36 [opt]
// successfully acquires events lock
    frame #6: 0x000000010a79f70e test-desktop`notify::debounce::timer::ScheduleWorker::fire_due_events::h68f460ada7f4030b(self=0x0000700006876bc0, now=<unavailable>) at timer.rs:85:16 [opt]

And these lines from thread 18 (id 18):

// attempts to grab events lock
    frame #5: 0x000000010a7a00a5 test-desktop`notify::debounce::timer::WatchTimer::ignore::hfa4765c6fbd481ee(self=<unavailable>, id=<unavailable>) at timer.rs:222:25 [opt]
    frame #6: 0x000000010a7b6168 test-desktop`notify::debounce::restart_timer::hf02fd7b09cfbb09d(timer_id=0x00007faea8b06340, path=PathBuf @ 0x0000700007c97820, timer=0x00007faea6736f48) at mod.rs:506:8 [opt]
// successfully acquires operations buffer lock
    frame #7: 0x000000010a7b5c5a test-desktop`notify::debounce::Debounce::event::h304ff5a9c2a8fbb4(self=0x00007faea6736f18, path=PathBuf @ 0x0000700007c978e0, op=<unavailable>, cookie=<unavailable>) at mod.rs:465:32 [opt]
mtak- commented 4 years ago

Possibly fixed by #210.

itkovian commented 4 years ago

For now, I went with 5.0.0-pre.2, which has no RawEvents, and used new_immediate to get a watcher. I did the delay manually, keeping track of timestamps at the event receipt time and waiting a bit in case the file I expect to find in the directory is not there yet.

I am fine with that for the moment, and as I understood the debounced part was up for a major rewrite at some point anyway.

0xpr03 commented 4 years ago

Some deadlocks are fixed now by #210 in the 4.X branch. We'll need some input if that resolves the problems and up-port this to 5.X

xudesheng commented 4 years ago

I encountered exactly same issue on 5.0.0-pre.1. I do need debouned event and I may not be able to move to 5.0.0-pre.2.

0xpr03 commented 3 years ago

Note that since 5.0.0-pre.2 none of the original debouncer code exists any more (and in fact no debouncer for the time being). So this is only v4 related for now. (I'm adding a v4 tag so we can track v5 issues separately.)