monome / crow

Crow speaks and listens and remembers bits of text. A scriptable USB-CV-II machine
GNU General Public License v3.0
166 stars 34 forks source link

Watchdog timer for frozen/broken Lua scripts #230

Closed trentgill closed 4 years ago

trentgill commented 4 years ago

Use the watchdog timer to see if main's while(1) loop has stopped running.

At present if the user creates a script that forms an infinite loop in Lua, that code will never exit. The system won't necessarily become unresponsive, but the USB port will not be able to accept messages, and the user must hard-restart.

If the user has uploaded a script with an infinite loop, the USB connection will never be populated, leading the user to think their crow is broken / they've made an incorrect connection.

A short timeout (like 100ms) should be enough, but could make it up to 1-2seconds to allow very computationally expensive init routines(?).

If the timer hits zero, we should 1) teardown the lua instance, 2) clear the user script, 3) restart crow.

trentgill commented 4 years ago

@tehn curious if you have thoughts / experience here from teletype? Perhaps @samdoshi or someone else involved on that side would? i haven't done this type of thing before.

samdoshi commented 4 years ago

I'm not a Crow expert, though looking through the code I can see the heritage going back to the Aleph codebase, so bits are somewhat familiar...

2) clear the user script

Does that mean the user will need to re-upload their script? If so, I'd go to extreme lengths to avoid that. The 'what-if' scenario is a script that only borks 1 in a 10,000 times...

On the Teletype, it was only the SCRIPT op that had issues with infinite loops, and the eventual solution was to push a frame with local data onto a stack each time the op was called, with a limit of how large the stack could get.

Anyway, this is all fundamentally the Halting Problem, and therefore :boom:.

From the looks of it, there are only a finite number of places that user Lua code is called (init, event hooks, REPL, ...), right? Can you set a guard up around those locations to add your watchdog?

Some interesting things from a bit of Googling:

lua_sethook: looks like you can call back into C every line / no. of operations, etc. Might be a nice way to avoid a watchdog entirely.

Once you've figured out you're in an error state (i.e. script has run for too much time / too many operations), you need a clean-ish way to terminate the script... the first two options here seem interesting.

Anyway, hope that's helpful, otherwise please ignore!

csboling commented 4 years ago

Fixed by https://github.com/monome/crow/pull/288