monome / norns

norns is many sound instruments.
http://monome.org
GNU General Public License v3.0
614 stars 144 forks source link

engines using polls fail on boot, when loaded from saved script #1158

Open catfact opened 3 years ago

catfact commented 3 years ago

https://llllllll.co/t/mangl/21066/284?u=zebra

[quote] just had another freeze up while booting into mangl. maiden was flooded with endless

warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll
warning: norns.poll callback couldn't find poll

and all buttons and encoders were non-functional. i was able to ssh in and reboot, at which point everything seemed fine. [/quote]

the warning makes it clear that there is a race condition somewhere and the poll data structure is not populated as it should be after engine load and before script init. (when mangl calls :start() on phase/level polls.)

i'd look at sequence between SC and oracle. Crone.setEngine calls reportCommands, then reportPolls.

in oracle.c, the handler sequence and test_engine_load_done should be ensuring all reports are complete before raising EVENT_ENGINE_LOADED and kicking off script init.


here's the entry point for lua, in startup: https://github.com/monome/norns/blob/main/lua/core/startup.lua#L65-L72

this looks a little suspicious in that the engine report should really happen before trying to launch scripts. but i don't immediately see a reason this would break polls.

dndrks commented 3 years ago

if it helps for testing, executing ;restart in the matron REPL with mangl loaded can reliably cause continuous callback poll messages as norns reboots to a cleared state. reloading mangl stops the messages. restarting supercollider also stops the messages.

catfact commented 3 years ago

i don't know what restart; in matron REPL is supposed to accomplish from a user-story perspective, or why we would recommend using it.

it's a little hard to follow (for me!), but seems to me like maiden just restarts the associated systemd service. if that service is matron alone, then i would expect polls to break along with several other things probably.

dndrks commented 3 years ago

why we would recommend using it

ah, i think it's just been generally accepted as a quick alternative to full-blown SLEEP when experiencing system lockups. didn't mean to muddy the waters, this was just the only way to reliably repro the poll error streaming on my unit.

tehn commented 3 years ago

this looks a little suspicious in that the engine report should really happen before trying to launch scripts. but i don't immediately see a reason this would break polls.

that is confusing for sure--- it seems the script load shouldn't work at all if the engine table doesn't know about its engines.


more generally i think the poll failure might be a symptom of a different problem. ;restart in maiden likely causes the symptom, but that's not the same as the core problem.

generally ;restart (in maiden) is helpful to me while writing lua core code that doesn't mess with SC (exactly because it only restarts matron, reloading the core lua libs). if anything with engines is happening it certainly makes more sense to do full SYSTEM > RESET, which is basically equivalent to a full SLEEP

the problem here, of course, is that if matron is jammed, you can't use the menu to restart--- hence ;restart via maiden.

perhaps we simply need another maiden command ;reset which does the exact same thing as the menu item (ie, full stack)

we added a bunch of stuff to clear the default state, which some people have reported as annoying, but is obviously very helpful for getting out of a failed state

catfact commented 3 years ago

yes of course, the polls issue is independent. i'm simply saying that restarting matron is not exacty useful as a repro case - though it is still a useful clue so thanks for that.

the proximate issue is that the .register method in poll.lua has not yet been called when the poll callback is fired, even though this is supposed to happen as part of the engine-load sequence:

/report/polls/entry handler in oracle.c is what ultimately causes Poll.register to be called in lua, which has evidently not happened yet when the script starts running.

the whole set of handlers in oracle.c are supposed to wait until both sequences are complete before raising EVENT_ENGINE_LOADED.

and of course the lua side should be waiting for that event before firing script init.

so it seems to me that one of these things is happening:

  1. somehow, EVENT_ENGINE_LOADED is being raised early
  2. somehow, script init is being called prematurely, before the engine-loaded event is handled by lua.

both possibilities seem straightforward to debug for, but i think case 2 is more likely.


as we've discussed, the dynamic reporting system is ultimately not really necessary, and could be replaced with metadata files read directly from lua. this works best if the metadata is autogenerated, so that code generation facility should be baked into the engine classes on SC side.


i think we already have other GH issues we can reopen for improving the reset/restart options. but as it stands, i'd say that restart; in matron REPL for recovery reasons, should always be followed by SLEEP or some other full reboot.

catfact commented 3 years ago

it seems the script load shouldn't work at all if the engine table doesn't know about its engines.

there is actually no reason this will break. Engine.load(name, cb) simply sends the requested name and fires the callback on completion. if the named engine doesn't exist, nothing happens, but the engine registry in lua isn't actually checked at this point.

what is definitely broken right now is the ability for a script to get a list of available engines, if that script is being run at launch. (Engine.names will be empty.) but not many scripts require this facility. (maybe none.)