Open jreusch-barco opened 8 years ago
This « kill » code was not very good since the beginning. Don’t be shy to write a better implementation.
Le 21 nov. 2016 à 18:31, jreusch-barco notifications@github.com a écrit :
hi i encountered a strange problem where on a jackd instance running as master (with alsa backend) some netjack2 client handles weren't closed(=destructed) properly.
on the master the JackNetMasterInterface::Recv(...) method calls JackNetMasterInterface::FatalRecvError() which itself calls JackNetMasterInterface::Exit(). This sends an UDP packet to the same process(?) with the KILL_MASTER flag set. Am i right in assuming such packets get only send by the jackd-master instance to itself?
In another thread of the same process the JackNetMasterManager::Run loop sees the KILL_MASTER packet and deletes the handle to the lost netjack2 handle... This seems quote odd, but the bad thing is: if this single UDP packet gets lost, the jackd-process ends up with an stale netjack2 handle and is unusable from this point on and does not recover? I know, netjack2 generally forgives packet loss not very well, but should it be that one single packet has such a vast impact? (XRuns are ok, but rendering the whole process useless is quite another class)
So, is there any obvious reason i miss that this was solved this way? Couldn't it be done in a more reliable way? What could this be? Sth. like a cleanup-queue which gets processed at the appropiate place? This could also be the first step in removing this "temporary" ( ;) ) ugly hack with the call to pthread_exit ;)
Thanks! Jan
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
hi Stéphane
i would really like to do so! but i could also use some "hints" ;)
why is the JackNetSlaveInterface::FatalRecvError()
code throwing a JackNetException
and the master-code isn't? are my assumptions right with the KILL_MASTER-signal?
Le 21 nov. 2016 à 18:31, jreusch-barco notifications@github.com a écrit :
hi i encountered a strange problem where on a jackd instance running as master (with alsa backend) some netjack2 client handles weren't closed(=destructed) properly.
on the master the JackNetMasterInterface::Recv(...) method calls JackNetMasterInterface::FatalRecvError() which itself calls JackNetMasterInterface::Exit(). This sends an UDP packet to the same process(?) with the KILL_MASTER flag set. Am i right in assuming such packets get only send by the jackd-master instance to itself?
Yes.
In another thread of the same process the JackNetMasterManager::Run loop sees the KILL_MASTER packet and deletes the handle to the lost netjack2 handle... This seems quote odd, but the bad thing is: if this single UDP packet gets lost,
How a packet could be lost in a same process?
the jackd-process ends up with an stale netjack2 handle and is unusable from this point on and does not recover?
Have you ever seen the problem occurring ?
I know, netjack2 generally forgives packet loss not very well, but should it be that one single packet has such a vast impact? (XRuns are ok, but rendering the whole process useless is quite another class)
So, is there any obvious reason i miss that this was solved this way? Couldn't it be done in a more reliable way? What could this be? Sth. like a cleanup-queue which gets processed at the appropiate place? This could also be the first step in removing this "temporary" ( ;) ) ugly hack with the call to pthread_exit ;)
Well it is a bit ugly, but the point was : the thread calling JackNetMasterInterface::Recv needs to notify the thread running JackNetMasterManager::Run when a Recv fails. Using a special packet seems to be easy way to to that. ThreadExit logic could possibly be changed by throwing a exception that would be be catched by the receive thread in order to properly quit.
hi
the jackd-process ends up with an stale netjack2 handle and is unusable from this point on and does not recover?
Have you ever seen the problem occurring ?
yep, that's why i dug into this issue :) it happened randomly on some machines when netjack2 clients got rebooted. It was rare, but as it rendered the jackd-master-process useless was quite severe.
Well it is a bit ugly, but the point was : the thread calling JackNetMasterInterface::Recv needs to notify the thread running JackNetMasterManager::Run when a Recv fails. Using a special packet seems to be easy way to to that. As stated i would like/propose to change this and post a patch here soon. :)
ThreadExit logic could possibly be changed by throwing a exception that would be be catched by the receive thread in order to properly quit.
This is still unclear to me and i need to dig deeper into it. The Slave-methods already solve it this way by throwing an exception, there is no call to pthread_kill
. It is unclear to me until now why the Master-methods do it this way...
This is still unclear to me and i need to dig deeper into it. The Slave-methods already solve it this way by throwing an exception, there is no call to pthread_kill. It is unclear to me until now why the Master-methods do it this way...
ok, got it ;)
hi there
i now have an alpha version of a patch for this. It still needs some rework&definitely some discussion :)
For now i just hooked up in the recv-loop of the JackNetMasterManager::Run
method which runs on my system every two seconds due to the timeout given to the recv method.
The problem i have: the jackd-process still gets unusable after some time. but instead of a few minutes now it takes up to 12 hours.
I would be glad to get some help regarding if i caused this with my patch or if it looks like this is another issue. The point where it crashes always starts with a log entry like
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::TimedWait name = jack_sem.1000_default_jackslave already deallocated!!
and is then followed by only giving this error from this point on
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:50 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:51 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:51 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
Dec 2 08:38:51 jackmaster jackctl[9858]: JackPosixSemaphore::Signal name = jack_sem.1000_default_jackslave already deallocated!!
This also occurs if i don't "exit" with an Exception and leave the call to ThreadExit
(which gives me the opinion that changing the return value of the process method from 0 to -1 in case of an exception isn't really the problem)
e: here is the current version of the patch patchv1.txt commit on github: c79111ba3c47a7a4ec836cb77b316b5733a08377
hi i encountered a strange problem where on a jackd instance running as master (with alsa backend) some netjack2 client handles weren't closed(=destructed) properly.
on the master the
JackNetMasterInterface::Recv(...)
method callsJackNetMasterInterface::FatalRecvError()
which itself callsJackNetMasterInterface::Exit()
. This sends an UDP packet to the same process(?) with theKILL_MASTER
flag set. Am i right in assuming such packets get only send by the jackd-master instance to itself?In another thread of the same process the
JackNetMasterManager::Run
loop sees the KILL_MASTER packet and deletes the handle to the lost netjack2-connection-object... This seems quote odd, but the bad thing is: if this single UDP packet gets lost, the jackd-process ends up with a stale netjack2 handle and is unusable from this point on and does not recover? I know, netjack2 generally forgives packetloss not very well, but should it be that one single packet has such a vast impact? (XRuns are ok, but rendering the whole process useless is quite another class)So, is there any obvious reason i miss that this was solved this way? Couldn't it be done in a more reliable way? What could this be? Sth. like a cleanup-queue which gets processed at the appropiate place? This could also be the first step in removing this "temporary" ( ;) ) ugly hack with the call to
pthread_exit
;)Thanks! Jan