Closed catfact closed 4 years ago
oh another thought, sorry
we tell users with locked-up norns that the rude button is a method of last resort (correct), and imply that matron repl ;restart
is preferred. two issues with that:
SYSTEM > RESET
should be used right away. if we want to remove this qualiication, we could put a probably-minor effort towards a more intelligent re-connection apparatus.restart
request. and its not uncommon for the maiden UI to be flooded and unresponsive at that point too.thanks for bringing this up.
SYSTEM > RESET should do a "clean" no-script restart, ie, delete the system.pset etc and reset to known working values. the current behaviour is misleading and doesn't solve all problematic use cases.
is it frequent that people somehow lock up their onboard UI? (ie knobs/screen?)
yes, it is easy to lock up the UI with scripting errors or due to system bugs. for example if we print some error message at 1000hz then the main loop is borked.
hence the proposal to have a clean reset executed directly from GPIO module, which seems easy ( i can PR it if you like)
Some quick questions
issuing ;restart to the matron repl in maiden. this will cause a lot of runtime problems, since matron needs metadata about running engines/polls.
Why is this? Does this means that if I restart matron I get into a broken state? It doesn't just get the required info when it starts?
Do we need four different types/ways (2-5 in the list) to reset stuff?
hence the proposal to have a clean reset executed directly from GPIO module, which seems easy ( i can PR it if you like)
We could also use a watchdog for this, although that's of course quiet invisible to the user/not user initiated.
here's my proposal:
;restart
via maiden will catch. if not, that's unfortunately what the hardware reset is for, and i am skeptical that watchdog is a good idea here--- false positives are way more problematic than an occasional hard shutdown, which should seriously only happen if someone is developing a script, so they should be aware of what they're doing. the fact that people think hard shutdowns fix anything is a huge misunderstanding that must be addressed... and obviously having point 1 above reset correctly will solve most peoples' issues.;restart
in maiden should perhaps always reset both matron
and crone
and supercollider
to get the system in a known state. this of course may change when/if we push around the supercollider handshake requirement (for later discussion). i generally find the quick matron reset helpful when developing the core lua menu stuff, but that's a minor case.ugh. on another note, undocumented feature: K1 held + SELECT from the menu clears the current script. though like SYS > RESET it's not accessible if the UI is locked up
partial fix by https://github.com/monome/norns/pull/1015
TODO: documentation
still open to conversation about watchdog/GPIO/etc
a hard reset (white button) will have a proceeding boot in "not clean" state which means a script will not be loaded, so the user will not be locked out. hence i don't think it's necessary to have a GPIO detection method for clean boot. if we want a dirty boot to self-delete config data i'm fine with that, but it doesn't seem necessary as the user could then execute SYSTEM > RESET though that would only reset the levels, vports, etc.
I think it would be nice to ensure that after startup everything is always guaranteed to be in a working state, no additional actions needed.
From @catfact's description I gather we already made sure everything was in the correct state when this happened based on the clean_shutdown
flag he mentioned. Or is that not correct?
IMHO it would make sense if this would do the same thing as SYSTEM > RESET
(maybe not restarting the services depending on when/where in the boot process we do this)
Also being able to reset (or restart) outside of SYSTEM > RESET
and the hardware reset button means that we should be able to shutdown properly in more cases. Resets with the hardware reset button should only be necessary if the whole system (ie the kernel/Linux) hangs, not when the components of the norns stack hang, this prevents potential issues with disk/filesystem corruption/broken journals.
And @tehn you're right, watchdogs can be tricky and false positives is something we definitely don't want. Just wanted to mention it as an option. Personally I like the suggestion of having a complex key combination but I also see the issues with that combo then not being available for scripts to use so figured it might be a possible alternative solution.
honestly there isn't much to break at this point:
neither of these are a Fix. they are a quick solution to user error. which is why this hasn't gotten much attention up until now. the clean_shutdown
flag is really just to prevent re-loading a failed script. and that totally works 100% even prior to today's fix.
the case of users truly stuck is uncommon and should only happen if someone if writing a script and gets in trouble... which hopefully can be solved via maiden resets/etc.
generally there should not be a common case where normal use of a script should cause a full lockup for someone, but of course this is possible.
i also don't think we realistically have seen repeatable lockups of the kernel or underlying norns components--- just the lua environment. so any fix should mostly address that. key combination is an option.
Why is this? Does this means that if I restart matron I get into a broken state? It doesn't just get the required info when it starts?
well, it doesn't signal any change to the sclang process. so the running engine can be doing stuff, and in the case of engine polls it happens to be sending OSC that matron now doesn't understand.
we can patch this of course. i think reset_audio
should just broadcast OSC to any/all connected process, engine interfaces should just handle this by shutting down current engine, and we should always broadcast this on startup just in case.
but here's a simplifying suggestion:
make a ;restart
in either REPL on maiden, cause both services to restart,
and maybe the whole norns-
service stack.
in fact: maybe rename ;restart
to ;reset
, and make it do the exact same thing as SYSTEM>RESET.
wouldn't that clarify things?
my assumption is that this:
SYSTEM > RESET should delete all system settings (levels, vport assignments, current script) which will get the software in a "clean" state (these settings are pretty minimal so it's not a hassle)
can be accomplished just by resetting all the systemd services, (causing a dirty-boot), and ensuring that handling a dirty-boot means deleting vports/psets. right?
generally there should not be a common case where normal use of a script should cause a full lockup for someone, but of course this is possible.
it happens. the norns API is very complicated and we can't pretend to have tested every possible script interaction for errors in the system stack on every update. so we really can't assume that it's user error or even a scripting error when the UI hangs for whatever reason.
the main reason i'm bringing up a separate hardware-driven reset, is: i have some suspicion that the white button is really very bad.
i spent a little time trying to research the susceptibility of XNAND flash to permanent damage from power loss during write. i didn't do enough to say for sure and it's a complicated topic, but i think it's a risk. (basically has a chance of creating a permanently-bad sector.)
even if it "only" corrupts the filesystem, in this environment that can mean hardware damage.
anyways, of course it's completely up to you whether a key-combo to restart is acceptable. it could be a really long hold (30s) or even something like a triple-hold plus a sequence of encoder turns. the purpose of such a thing would be to really do as much as possible to give people as few excuses as possible for using the white button under any circumstance.
in fact, here's an additional suggestion:
;reset
can reset all both processes and do journalctl
and dmesg
or whatever to assemble an error report. is that crazy?
i actually think a long hold (10s) in a particular sequence (k3,k2,k1 or something) would be a good idea. granted this would be managed by matron, so it's less immune than maiden's ;reset... but i still think it's a good idea.
logging, sure. would be possibly sensible to dump a log to ~/dust/data so it's view-able in maiden. i'm unsure which logs exactly we'd want.
here's my proposal:
system.pset
and system.state
, skip script resume;restart
in either window which does systemd restart norns-*
(this will cause a dirty load of matron)note that if the norns is shut down (or matron restarted) in any way other than SLEEP it will boot up the next time dirty.
matron: add lower level reset function (which basically issues systemd commands)
Would this be different from systemd restart norns-*
as listed in point 4?
maiden: add ;restart in either window which does systemd restart norns-* (this will cause a dirty load of matron)
Could the dirty startup, which will trigger item 3 if I understand it correctly, somehow interfere with/be annoying when developing Lua (like the menu code you mentioned) or SC code from maiden?
P.S. We could probably use systemd's PartOf
dependency so you only need to restart one item (probably norns.target
)
PartOf=
Configures dependencies similar to Requires=, but limited to stopping and restarting of units. When systemd stops or restarts the units listed here, the action is propagated to this unit. Note that this is a one-way dependency — changes to this unit do not affect the listed units.
Would this be different from systemd restart norns-* as listed in point 4?
bullet 1 isn't really needed. just issuing an os command is sufficient.
i was not suggesting removing maiden's ;restart
;start
and ;stop
for matron. so that could still be used for development.
recently there were some issues reported on the forum related to the way program lifecycle management works on norns (or doesn't work.)
it made me realize that the use cases and the final design concept are fuzzy to me, and i guess to most users.
here are a few things that can be used to reset parts of the norns stack:
sudo shutdown
norns-matron
,norns-crone
,norns-sclang
;restart
to the matron repl in maiden.norns-matron
service (i think!);restart
to the sc repl in maiden.norns-sclang
service (i think!)_norns.reset_audio
in lua, and on down) are still available:Interpreter.recompile
, which also relaunches the scsynth process.clean_shutdown
flag, meaning next launch will not run any script, restore mix levels, or (importantly) restore device mapping. this is still the only way for a typical (non-commandline) user to get out of many bugged situations. (e.g. a bug in the default script, bug in our handling of vports with nonexistent devices, anything that locks up the UI.)in other words, i guess i think we need to clearly inform users about which method to use for what circumstance. (troubleshooting, recovering, updating etc.)? and we may want to refine the offerings. for example:
hardware/gpio
module and launch a special timeout thread.systemd
, without saving any application state or writingclean_shutdown
.what do yall think?