Closed TerryE closed 5 years ago
Excellent concept, fully support that :+1:
One footnote here. After a side conversation with @jmattsson, I've just realised that the use of luaX_
on the esp32 codebase was introduced by @jpeletier with his MQTT port but unfortunately breaks the Lua internal naming conventions, as luaX_
is already allocated to llex.c
. We will stick to luaN_
and possibly add luaW_
for the NodeMCU additions to core VM Lua functionaliy. I will back out the luaX_
references when I add lua53 to the dev-esp32 branch.
I've had a few other commitments over this last few weeks, so progress has been slow on this, but I now consider this chunk of work as stable.
I am visiting family this next couple of days so I will do the PR itself on Thursday. Given that this is an architectural alignment for lua53
, I think that we will need to leave it as an unmerged PR for a few weeks, It makes sense to do the next master drop before merging it.
@marcelstoer, are you comfortable with this?
Implemented in #2836
The NodeMCU architecture in essence.
NodeMCU works broadly the same as
Node.js
(here is a good overview). On the ESP variants (RTOS for the ESP32 or the non-OS SDK), a Lua application is composed of a set of tasks organised and scheduled through a Single Threaded Event Loop Scheduler.Each task typically has a thin C initiator which then calls a Lua function that may call other Lua functions in turn, but then the whole runs to completion. The event scheduler will then start the next task ready to run based on FIFO within priority. The whole framework is based on the rule that individual tasks run to completion and are not interrupted by other tasks, so the system as a whole can be implemented in a single processing thread. For this to work, tasks should be short, sharp and non-blocking. Each tasks is typically initiated based on an external event: a timer has fired, a GPIO has been set, a network packet has arrived, and so these are referred to as callbacks in SDK terminology. Because the Lua VM only executes one task at a time we don't need mutexes or other fancy task synchronisation mechanisms. Multi-tasking is cooperative: a task yields control by terminating.
In a typical well-written ESP Lua application, most task are short and execute within a few milliseconds so the ESP processor can complete 100s of tasks a second with minimal overhead, making it and NodeMCU really well suited to embedded IoT applications.
So a typical implementation pattern for a task is that is comprises:
A initiator coded in C which is scheduled in response to an external event. For example a network socket event, such as receiving a TCP packet, invokes the routine
net_recv_cb()
which then decodes the event and decides which Lua action function to execute.There is typically a 1-1 association with a Lua-callable booking function which can book such events and associate the correct Lua function with the event occurring, in this case
net_on('receive',func)
.Because each task exits from a Lua VM perspective, that is the Lua call stack unrolls entirely, the only Lua variables that are preserved from task-to-task are stored in the Lua environment (
_G
) and in the Lua Registry or their direct children. The Lua GC will collect all local variables created and released during the task execution.Because Lua task functions must persist from task to task, this are all stored in the Lua Registry and referenced using an integer handle. The booking function will use the
luaL_ref()
API to allocate this registry slot and obtain the handle, and then the event routine will retrieve the task function using this handle and then callluaL_unref()
to return the used slot to the pool, before executing alua_call()
to execute the Lua task.This is a pretty fixed implementation pattern but we haven't encapsulated this in a higher level API, so there are subtle differences in how this is coded from task to task. Not good.
Whilst NodeMCU as a whole makes very effective use of this framework through its modules library, ironically the core Lua VM does not. This is possibly because the Lua port was done first to bootstrap the implementation. A good example of where we could use this effectively follows:
Lua error handling and Panics
NodeMCU implements the standard Lua error handling model. In this any call level can establish an error handler as part of calling a sub-function. If errors are thrown in this sub-function then they are caught by the error handler. If an error is thrown and not caught by an error handler then it is caught at the top level by what is known as the Lua Panic handler, and on NodeMCU this emits a terse error message to UART0 before rebooting the ESP. This makes Panic errors very difficult to diagnose.
There is absolutely no reason for panics to be handled this way. If we look at a typical pattern for calling a task function:
Here we are calling the function with the handle
ud->client.cb_sent_ref
passing the userdataud->client.cb_sent_ref
as context. If thiscb_sent_ref
routines throws an error then this will panic and reboot the ESP. Why do this? If we replace this with a pattern:We can not only save on coding space, but also get panic handling with full error traceback 'for free'. There are 71 such fragments in the
modules
directory so doing this is a pretty straightforward batch edit. We would need one extranode
callnode.atpanic(function)
which established a non-default panic handler. Thenodemcu_call()
would be something along the lines of:Now the call always returns whether or not the function throws an error. However if it does then the
nodemcu_traceback()
gathers a full error traceback and does a task post to the registered atpanic routine with the traceback as a string argument. The default at panic routine would print this full traceback and restart the cpu. However a production application might log the error over the network.Other possible uses of tasking within the Lua VM / NodeMCU runtime.
It is moot whether we should regard such features as Lua components (i.e. with a
lua_
prefix and part of the lua file hierarchy) or are as NodeMCU ones (i.e. with anodemcu_
prefix and part of the platform or similar file hierarchy). My view is that these extensions are intimately tied into the Lua VM and we already have a Lua module for the NodeMCU extensions; this uses theluaF_
and is inlflash.c
, but there is sound sense in lumping all of these extras together and calling this filelnodemcu.c
instead.Anyway as well as error handling other placs where I am planning to use this tasking model include:
node.output()
spoolingWell any considered responses?
PS:
or follow theand keepdev-esp32
lead and useluaX_
for thisluaN_
for LFS.