monome / norns

norns is many sound instruments.
http://monome.org
GNU General Public License v3.0
614 stars 144 forks source link

metro_stop error from pthread #769

Open markwheeler opened 5 years ago

markwheeler commented 5 years ago

I often see this error in maiden when I short press key 1 to return to the menu from within a script. I think it only happens the first time returning to the menu. Not sure if it's related to something about metro use in a script(?) v2 beta 190320

metro_stop(): pthread_cancel() failed; error: 
specified thread does not exist

Have seen this when running awake, ui demo, etc.

tehn commented 5 years ago

this message has been present since v1. i haven't been able to locate the exact cause yet. no ill effects seen however.

ranch-verdin commented 5 years ago

Think I have seen a problem correlated with the error message in question.

Observed running some variant of this script (https://gist.github.com/ranch-verdin/7c641533185167eee0d14ca264c3692a) running in conjunction with 'sgynth the DSP host' running 'sgynth the groovebox'.

Sometimes one or more of the metros failed to start when reloading the script, and think that behaviour was correlated with the same error message. Sorry this is all a bit vague, I will try to reduce to a minimal test case at some point...

ngwese commented 5 years ago

i triggered this error last night while i was doing some basic script dev. didn’t try to isolate a repro case but i believe it started after i ran a script which started a metro outside of init and lacked code to stop the metro or cleanup in general. after running that script i started a new script, each time i ran the new script i’d see that pthread error which got me thinking the menu/script cleanup logic was trying to cleanup metro state left over from the first errant script

catfact commented 4 years ago

i have still seen this message from time to time, but no luck with a repro case.

i've tried a number of pathological things in lua, including creating metros outsid of init() that aren't explicitly cleaned by the script, with no luck.

to me, it doesn't appear to be anything with the lua layer, but something really sporadic in the C layer.

seems implausible to me that it would be related to a "double stop" message from lua, since the C layer checks for this: https://github.com/monome/norns/blob/master/matron/src/metro.c#L101

there are 2 places in the C metro module that print this error: here, if the thread doesn't exist right after pthread_create() in metro_init(): https://github.com/monome/norns/blob/master/matron/src/metro.c#L162

and here, likewise in metro_cancel (which i think must be where we are actually seeing it) https://github.com/monome/norns/blob/master/matron/src/metro.c#L252

metro_cancel is a static function, called on stop (obviously) and start (less obviously, to cancel and recreate the thread if it is already running.)

it's the second case that seems like a likely culprit to me, and has the smell of a race condition. the calls are behind mutexes, but there must be a concurrency logic error somewhere.

simonvanderveldt commented 4 years ago

I also run into this every now and then. I'll try to pay more attention when it happens to see if I can reproduce it.