jackaudio / jack2

jack2 codebase
GNU General Public License v2.0
2.22k stars 377 forks source link

jack_shm_lock_registry fails... can cause JACK to be inoperable #214

Open mspanc opened 8 years ago

mspanc commented 8 years ago

I am using JACKD with the same version as shipped with Ubuntu 16.04 (1ed50c9) with patch mentioned here: https://github.com/jackaudio/jack2/issues/212 I am setting max clients to 2000 and I have increased MAX_SHM_ID from 256 to 2048.

At some point I have noticed that my JACK clients get error 17 while connecting 2 ports.

I've found out that jackd process is running but not responding to anything.

I've checked logs and found something like

jack_shm_lock_registry fails...

Full log: jackd.log.zip

Backtrace of jackd process:

Thread 4 (Thread 0x754a17758700 (LWP 813)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000754a1738053c in Jack::JackPosixProcessSync::Wait (this=this@entry=0x2356fc8) at ../posix/JackPosixProcessSync.cpp:81
#2  0x0000754a17377818 in Jack::JackMessageBuffer::Execute (this=0x234ed90) at ../common/JackMessageBuffer.cpp:104
#3  0x0000754a1737f530 in Jack::JackPosixThread::ThreadHandler (arg=0x2356fa8) at ../posix/JackPosixThread.cpp:59
#4  0x0000754a167be6fa in start_thread (arg=0x754a17758700) at pthread_create.c:333
#5  0x0000754a16adab5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x754a1058e700 (LWP 814)):
#0  0x0000754a16a9f8dd in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000754a16ad14d4 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
#2  0x0000754a173818e5 in JackSleep (usec=<optimized out>) at ../linux/JackLinuxTime.c:158
#3  0x0000754a1065b62d in Jack::JackDummyDriver::Process (this=0x2318900) at ../common/JackDummyDriver.h:51
#4  0x0000754a1738e73a in Jack::JackThreadedDriver::Execute (this=<optimized out>) at ../common/JackThreadedDriver.cpp:244
#5  0x0000754a1737f530 in Jack::JackPosixThread::ThreadHandler (arg=0x2315af8) at ../posix/JackPosixThread.cpp:59
#6  0x0000754a167be6fa in start_thread (arg=0x754a1058e700) at pthread_create.c:333
#7  0x0000754a16adab5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x754a1049c700 (LWP 815)):
#0  0x0000754a16acee8d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000754a173a01cd in poll (__timeout=10000, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  Jack::JackSocketServerChannel::Execute (this=0x754a1585f048) at ../posix/JackSocketServerChannel.cpp:225
#3  0x0000754a1737f530 in Jack::JackPosixThread::ThreadHandler (arg=0x754a1585f160) at ../posix/JackPosixThread.cpp:59
#4  0x0000754a167be6fa in start_thread (arg=0x754a1049c700) at pthread_create.c:333
#5  0x0000754a16adab5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x754a177e4740 (LWP 611)):
#0  do_sigwait (sig=0x7f5992f984a4, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64
#1  __GI___sigwait (set=set@entry=0x754a175d1b60 <sigmask>, sig=sig@entry=0x7f5992f984a4) at ../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x0000754a17394492 in jackctl_wait_signals (sigmask=sigmask@entry=0x754a175d1b60 <sigmask>) at ../common/JackControlAPI.cpp:687
#3  0x0000000000402218 in main (argc=14, argv=0x7f5992f98a48) at ../common/Jackdmp.cpp:623
# ls -l /dev/shm/jack*
ls: cannot access '/dev/shm/jack*': No such file or directory
$ jack_lsp 
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for 4294967295, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for 4294967295, skipping unlock
JACK server not running

Even if it hits some limits and the error is not recoverable it would be much better if it had crashed so supervisor can restart it.

elan commented 2 years ago

FWIW, I see a very similar issue on occasion on ARM64 (Raspberry Pi 3, 4, and NanoPI R4S). Usually the log looks like this:

Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: JackAudioDriver::ProcessGraphAsyncMaster: Process error
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: JackEngine::XRun: client = network-sender-a2ffd535-c723-4bd5-9797-680cb661d12f was not finished, state = Triggered
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]:
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: JackEngine::XRun: client = network-sender-a2ffd535-c723-4bd5-9797-680cb661d12f was not finished, state = Triggered
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: JackAudioDriver::ProcessGraphAsyncMaster: Process error
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]:
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: JACK semaphore error: semop (Invalid argument)
Sep 17 09:47:23 nanopi-r4s node[1995]: Process[jackd]: jack_shm_lock_registry fails...

And then JACK stops working.

I've been through the code quite a few times and the relevant man pages and I can't see any obvious reasons why the semop (Invalid argument) would be triggered.

Would love to hear any ideas...

elan commented 2 years ago

For what it's worth, I figured this one out.

On some Linux installs, /etc/systemd/logind.conf works to remove IPC (i.e. /dev/shm semaphores) for processes after the user has "logged out". If you're running JACK as a user service, this can have the effect of ripping the semaphores out from under it e.g. after a user cron run.

Work around by adding RemoveIPC=no in that file.