mheily / libkqueue

kqueue(2) compatibility library
Other
236 stars 77 forks source link

Don't call sigwaitinfo(2) with disabled pthread cancellation state #148

Closed mkhon closed 2 years ago

mkhon commented 2 years ago

pthread_join() is called with kq_mtx held, but wait_thread_loop() is never cancelled, so kq_mtx is never released.

An additional fix could be to never call potentionally forever-blocking functions (like pthread_join()) with any mutexes held.

This can actually reproduced when running under gdb 7:

KQ [16369]: libkqueue_init(): library initialization complete
KQ [16369]: kqueue(): [358]: unlocked &kq_mtx
KQ [16369]: kqueue(): [378]: waiting for &kq_mtx
KQ [16369]: kqueue(): [378]: locked &kq_mtx
KQ [16369]: linux_eventfd_init(): eventfd=10 - created
KQ [16369]: evfilt_proc_init(): [411]: waiting for &proc_init_mtx
KQ [16369]: evfilt_proc_init(): [411]: locked &proc_init_mtx
KQ [16369]: evfilt_proc_init(): creating wait thread
[New Thread 0x7ffff1e2e700 (LWP 16373)]
KQ [16373]: wait_thread_loop(): tid=16373 - waiter thread started
KQ [16373]: wait_thread_loop(): [311]: waiting for &proc_pid_index_mtx
KQ [16373]: wait_thread_loop(): [311]: locked &proc_pid_index_mtx
KQ [16369]: evfilt_proc_init(): [442]: unlocked &proc_init_mtx
KQ [16369]: filter_register_all(): complete
KQ [16373]: wait_thread_loop(): [355]: unlocked &proc_pid_index_mtx
KQ [16373]: wait_thread_loop(): waiting for SIGCHLD
[New Thread 0x7ffff162d700 (LWP 16374)]
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16374]: monitoring_thread_loop(): tid=16374 - monitoring thread started
KQ [16369]: linux_kqueue_init(): kq=0xee52e0 - fd=9 use_count=1
KQ [16369]: linux_kqueue_init(): kq=0xee52e0 - monitoring fd=8 for closure
KQ [16369]: kqueue(): kq=0xee52e0 - alloced with fd=9
KQ [16369]: map_insert(): idx=9 - inserted ptr=0xee52e0 into map
KQ [16369]: kqueue(): [403]: unlocked &kq_mtx
KQ [16369]: kevent(): [378]: waiting for &kq_mtx
KQ [16369]: kevent(): [378]: locked &kq_mtx
KQ [16369]: kevent(): [392]: waiting for &(kq)->kq_mtx
KQ [16369]: kevent(): [392]: locked &(kq)->kq_mtx
KQ [16369]: kevent(): [404]: unlocked &kq_mtx
KQ [16369]: kevent(): --- START kevent 1 --- (nchanges = 1 nevents = 1)
KQ [16369]: kevent_copyin(): nchanges=1 nevents=1
KQ [16369]: kevent_copyin_one(): src={ ident=0, filter=-12 ((null)), flags=0x0001 (EV_ADD), fflags=0x0002 (NOTE_VERSION_STR NOTE_THREAD_SAFE NOTE_DEBUG_PREFIX NOTE_DEBUG_FUNC), data=0, udata=(nil) }
KQ [16369]: kevent_copyin_one(): kn=0xee7a70 - created knote { ident=0, filter=-12 ((null)), flags=0x0001 (EV_ADD), fflags=0x0002 (NOTE_VERSION_STR NOTE_THREAD_SAFE NOTE_DEBUG_PREFIX NOTE_DEBUG_FUNC), data=0, udata=(nil) }
KQ [16369]: kevent(): (1) kevent_copyin rv=1
KQ [16369]: kevent(): (1) returning 1 events
KQ [16369]: kevent(): (1) eventlist[0] = { ident=0, filter=-12 ((null)), flags=0x0041 (EV_ADD EV_RECEIPT), fflags=0x0002 (NOTE_VERSION_STR NOTE_THREAD_SAFE NOTE_DEBUG_PREFIX NOTE_DEBUG_FUNC), data=0, udata=0x7ffff6320900 }
KQ [16369]: kevent(): [487]: unlocked &(kq)->kq_mtx
KQ [16369]: kevent(): --- END kevent 1 ret 1 ---
KQ [16374]: monitoring_thread_loop(): [300]: waiting for &kq_mtx
KQ [16374]: monitoring_thread_loop(): [300]: locked &kq_mtx
KQ [16374]: monitoring_thread_loop(): fd=9 - freeing kqueue due to fd closure (signal) for sfd=8 
KQ [16374]: monitoring_thread_kqueue_cleanup(): kq=0xee52e0 - fd=9 use_count=0 cleaning up...
KQ [16374]: kqueue_free(): kq=0xee52e0 - freeing
KQ [16374]: map_remove(): idx=9 - removed ptr=0xee52e0 from map
KQ [16374]: kqueue_free(): [309]: waiting for &(kq)->kq_mtx
KQ [16374]: kqueue_free(): [309]: locked &(kq)->kq_mtx
KQ [16374]: evfilt_proc_destroy(): [454]: waiting for &proc_init_mtx
KQ [16374]: evfilt_proc_destroy(): [454]: locked &proc_init_mtx
KQ [16374]: evfilt_proc_destroy(): tid=16373 - cancelling
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16373]: wait_thread_loop(): sigwaitinfo(2): Interrupted system call
KQ [16369]: kqueue(): [356]: waiting for &kq_mtx
(gdb) thread apply all bt

Thread 3 (Thread 0x7ffff162d700 (LWP 16374)):
#0  0x00007ffff5138017 in pthread_join (threadid=140737251567360, thread_return=0x7ffff15ffe40)
    at pthread_join.c:90
#1  0x00007ffff631eaaa in evfilt_proc_destroy (filt=0xee5618)
    at /usr/src/debug/libkqueue-2.6.1/src_0/posix/proc.c:473
#2  0x00007ffff630be26 in filter_unregister_all (kq=0xee52e0)
    at /usr/src/debug/libkqueue-2.6.1/src_0/common/filter.c:172
#3  0x00007ffff6310de6 in kqueue_free (kq=0xee52e0)
    at /usr/src/debug/libkqueue-2.6.1/src_0/common/kqueue.c:310
#4  0x00007ffff63125d9 in monitoring_thread_kqueue_cleanup (signal_fd=8)
    at /usr/src/debug/libkqueue-2.6.1/src_0/linux/platform.c:223
#5  0x00007ffff6312aaa in monitoring_thread_loop (arg=0x0)
    at /usr/src/debug/libkqueue-2.6.1/src_0/linux/platform.c:306
#6  0x00007ffff5136ea5 in start_thread (arg=0x7ffff162d700) at pthread_create.c:307
#7  0x00007ffff49e8b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7ffff1e2e700 (LWP 16373)):
#0  0x00007ffff492158a in do_sigwaitinfo (info=0x7ffff1e00fe0, set=0x7ffff1e00f60)
    at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:54
#1  __GI___sigwaitinfo (set=0x7ffff1e00f60, info=0x7ffff1e00fe0)
    at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:79
#2  0x00007ffff631e43b in wait_thread_loop (arg=0x0) at /usr/src/debug/libkqueue-2.6.1/src_0/posix/proc.c:359
#3  0x00007ffff5136ea5 in start_thread (arg=0x7ffff1e2e700) at pthread_create.c:307
#4  0x00007ffff49e8b0d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7ffff7fdf580 (LWP 16369)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff5138e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007ffff5138d68 in __GI___pthread_mutex_lock (mutex=0x7ffff65262c0 <kq_mtx>)
    at ../nptl/pthread_mutex_lock.c:78
#3  0x00007ffff6310f87 in kqueue () at /usr/src/debug/libkqueue-2.6.1/src_0/common/kqueue.c:356
#4  0x00007ffff7b2cf1f in fr_event_list_alloc (ctx=0x0, status=0x7ffff73c6a60 <_loop_status>, 
    status_uctx=0x0) at src/lib/util/event.c:2644
#5  0x00007ffff73c6b92 in main_loop_init () at src/lib/server/main_loop.c:253
#6  0x000000000040588f in main (argc=2, argv=0x7fffffffe518) at src/bin/radiusd.c:796
(gdb)