nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.66k stars 3.12k forks source link

New 1.5 feature -- task hook for modules. #871

Closed TerryE closed 8 years ago

TerryE commented 8 years ago

This is a suggestion for a newe internal feature within the firmware that will facilitate issues discussed in #845 #851 and #870 amongst others.

The SDK implements a task post and catch system as an alternative to the other system callbacks. In many ways with is somewhat like a one-shot tmr.alarm() with a delay of zero. This help address an is that is common with non-preemptive framework such as is in the SDK, in that application tasks need regularly to yield control back to the SDK to service higher priority activities otherwise bad things start to happen. We have a task hook for the UART and the node.input; we need to add one for the gpio; another for net and now another wifi. I would prefer to do this in such a way that we don't need a material restructure of the core, but also one which means that libraries can declare a task handler hook without need to update central files.

My suggestion is that any library can declare within its module map an entry of the format:

  { LSTRKEY( "\0__task" ), LFUNCVAL( lgpio_task_handler ) },

I've pinched the leading null concept from PHP. This essentially makes it impractical to see or to call the __task entry at a Lua application level, but enables a simple central API for scheduling a task execution, and then dispatching to it. The module maps will be walked during startup to build a list of monikers for tasks that are registered (actually the address of the handler). This will be used by core user_main.c:task_lua() routine to post captured tasks.

There would would be one added call and one macro:

Note that the lua_post macro is a simple wrapper around system_os_post() and make no demands on the lua system, Lua State, etc. It only takes a single parameter but there is nothing stopping the module passing an address if anything else is needed. If modules need to implement initialisation post processing or an action callback after an ISR interrupt then they can use this feature to implement it.

Comments?

jmattsson commented 8 years ago

Do you propose to also expose this into the lua environment? Like a node.defer(function() print("a wee bit later") end) or some such? Would there be any useful value in having that?

nickandrew commented 8 years ago

I think it's very important that Lua code should be able to create a new task so it can yield control to the SDK and continue when the SDK has finished housekeeping. I looked at coroutines but I don't think they work that way.

In my temperature application, the Lua code has to iterate through a (possibly very large) set of devices to read their temperatures, each of which takes time and cumulatively could go over the 10ms deadline. It makes sense to have Lua post a task to the SDK and then yield control to the SDK. This task would read the next device in the list, post the next task to be done and then yield.

Also on a related matter, I have needed to call a C function from a GPIO interrupt so I have hacked that into the gpio.c module. In gpio_intr_callback I added another function pointer array, and if the pointer is set for a particular pin, that function is called instead of any potential lua function. It works well enough for my purposes, but it isn't up to PR standards. So I hope in your GPIO interrupt cleanup you will ensure that C functions can be called.

TerryE commented 8 years ago

@nickandrew , I'll look at your code, but this is reasonably hairy Lua core, so I will want to get it right and no doubt Johny and you will keep me right. As I said qpoi, wfif (should), uart, node.input, and startup all use this. I want to enable its use with any module or C code without having to tweak the core each time.

As to Johny's point, exposing it to the Lua application level is probably a good move. At the moment you have to do something like tmr.alarm(6,1,0, continue_cb). It would help making it easier to use coroutines to write serial application logic. (Though I am now sue this is a good idea as its pretty advanced shit getting your head around how the Lua coroutine model interacts with the SDK).

One specific point: we have only one a uint32_t (or (void *)) parameter. Should I enable multiple task entries per module (e.g. allow "\0__taskN"), or force that level of dispatch onto the module? It's a trivial addition. Also note that I am proposing that these ROMtable entries -- that is they are statically declared in the module.

nickandrew commented 8 years ago

@TerryE Take a look at https://github.com/nickandrew/nodemcu-firmware/commits/gpio-trig which is cleaned up a little more than 9 hours ago.

The idea is that a the interface to register for a callback on a GPIO interrupt is a C function, which will be called with the supplied (void *) argument. If Lua wants to register a gpio.trig() function then that's done with gpio_lua_callback. I have done some testing. Of course we don't want the ISR function to call directly into Lua, I'm just replicating the current behaviour.

I expect the (void *) argument can be L, or is this unnecessary?

My device driver which uses this code works reliably, but I can crash the NodeMCU by stressing the interrupt - kinda like Philip's screwdriver test - except Lua is not involved. If my ISR code outputs to the UART then it's fairly easy to cause a crash. Even when I remove all UART output I can make it crash after a minute or two.

TerryE commented 8 years ago

I will do, but my point here is that multiple modules require the task_post hook, so I would prefer to use a single standard mechanism rather than one for node, another for uart, another for gpio, etc.

The argument should definitely not be L or anything Lua related, because however we extend the use of this tasking interface, it has to conform to ISR requirements and the ISRs should have no knowledge of Lua. The reason for this is separation: if we expose Lua context to ISRs then someone might start exploiting this, and Lua is non-reentrant -- this is one of our causes of runtime instability.

I feel that a possible aspect of using the UART is that it is generating interrupts and I suspect that the main issue that we've got is nested interrupts -- that is a second putative interrupt failing because a first has disabled interrupts too long, or the first interrupt routine has re-enabled interrupts but is running time critical code which itself gets interrupted. It might also be that UART O/P is synchronous, and doing any UART O/P near any ISR or time critical code will break Espressif's own guidelines.

nickandrew commented 8 years ago

The SDK indicates that hardware timer interrupts with "NMI source" can interrupt other ISRs. But I'm not using PWM so I don't know where the interrupts could be being nested.

TerryE commented 8 years ago

@jmattsson Johny, I've got the basic version working with the uart and node.c as per this branch e604437 on my fork. (There's a -Wl,--cref in it which should be there, but I will fix that up before doing the PR). This version is for review and comment only.

It's a bitch developing this sort of change without a debugger isn't it? It's very difficult to do the old fashion inline printing and incremental add of functionality because this is part of the core startup and the firmware just fails to start at all if you've made a boo-boo. In the end I created a set of dummySDK stubs so that I could first debug in gdb on the host VM and then is took me 30 mins to twig that I was copying my corrected task.h over my sftp mapped share into app/task instead of app/include/task so was still picking up the old incorrect one -- Durrrhhh. Anyway for anyone interest in you the tasking architecture works see the task.c header documentation. A lot of this needs to go into an advanced FAQ topic.

jmattsson commented 8 years ago

Cool, I dropped a few comments in there!

For early startup debugging you can typically get away with

uart_div_modify(0,52000000/115200);
ets_printf("Hey, this is visible before system_set_os_print(0) has run!");

Of course, when you're developing an exception handler to support non-32bit loads, printing strings doesn't work so well... ;)

The documentation in task.c should quite possibly go into new doc (e.g. in docs/en/dev/tasks.md) for easy viewing.

TerryE commented 8 years ago

I can add a task.md to docs/en. @marcelstoer Marcel, should we do as Johny suggests and add a dev subdir for this type of developer guide?

Johny raised a valid point on his inline comments. We have three available task levels for application use: low, medium, high, which (in line with J's suggestion) we allocate as follows:

Though we need to remember that the priority only impacts the ordering of queued tasks. Once a task is executing, it is not preempted and runs to completion.

pjsg commented 8 years ago

Terry -- when your __task callback is called, do you pass in a safe lua_State object?

I'm less happy about the priorities. The writer of the GPIO module cannot know what the application developer is going to do with the lua gpio callback. It might take hundreds of ms to execute. From the experiments on timing that I did, I was surprised how long things took. In particular, to refresh my 128x64 OLED display took over 400ms.

I suspect that the priorities are really about the order of execution of callbacks and nothing to do with the amount of time that a callback will take to execute. I don't want the application developer to have to choose the priority of the gpio or timer callback. I could argue for a model which says that callbacks should happen in the order in which they are generated. I.e. all LUA code should be run out of a single priority level. The other levels can be used by C code (or LUA internals) for other purposes.

Also, I would like to be able to have more than one callback per module. The code that I want to try and build around interrupting long running lua programs, posting a message, returning to the sdk and then coming straight back -- this will probably be part of the node module, but it probably deserves its own callback hook.

TerryE commented 8 years ago

I scrapped the __task idea -- too cumbersome. There's one API call which you can use as many times as you want, typically in your fred_init() function that your module declaration.

NODEMCU_MODULE(NODE, "module", fred_map, fred_init);

So if you just wanted to tdo a second phase of initialisation you might just have:

task_post_medium(task_get_id(fred_init_2),0);

or if you also want a hook to use in a driver then you might have a globally declared static, say, fred_isr_signal which you want to call fred_trigger() so you'd just do this in your init():

fred_isr_signal = task_get_id(fred_trigger);

and the ISR would include the task_post_medium(fred_isr_signal, param) macro. The priority is up to the poster. The param is a uint32 which you can overload anyway you want.

Hope that this is clear. Have a look at my task.c which explains all of this.

marcelstoer commented 8 years ago

This is a side track...

Marcel, should we do as Johny suggests and add a dev subdir for this type of developer guide?

We could do that but I'd prefer having a DEVELOPMENT.md in the repository at root level. It's quite common (even expected for some) that you have a triumvirate of README.md, CONTRIBUTING.md and DEVELOPMENT.md for GitHub repositories.

Besides, as long as we don't have collapsible navigation items in our docs theme (see https://github.com/mkdocs/mkdocs/issues/588) it's helpful if our navigation doesn't get any longer as really necessary :unamused:. dev isn't intended for a wide audience I understand. So, occupying precious screen real estate in our public documentation wouldn't be my first choice.

TerryE commented 8 years ago

OK with the suggestion of having an overall root DEVELOPMENT.md, but this stuff is pretty esoteric. It is really too large to be a section in a single document. Maybe we could move it to a FAQ topic area, and provide a short summary in the FAQ and a link to the detail, but this would still be best in a subfolder dev, faq, whatever.

jmattsson commented 8 years ago

OK with the suggestion of having an overall root DEVELOPMENT.md, but this stuff is pretty esoteric.

Aye, I too would prefer to not end up with another massive document - the sheer size of the old wiki md was one of the deterrents (aside from a lousy editor). I don't suppose you could swing some fancy javascript collapsing-support, Marcel? ;) ... hmm, an idle thought - does mkdocs handle the case where a .md is linked to from a file mentioned in mkdocs.yml, but doesn't have an explicit entry there? Could we sneak a non-expanded doc tree in that way?

pjsg commented 8 years ago

I like the simplicity of the task interface. However, the comment that says that running an ISR out of flash is (sort of) OK is false. The problem (as far as I can tell) isn't timing but when the ISR interrupts the paging in of a lower level task from flash. The ISR brings something else in from flash, but in the course of so doing adjusts the implicit flash pointer. This means that the rest of the lower level task pagein gets data from the wrong place. You don't need to be using the SPIFFS code to cause a problem.

TerryE commented 8 years ago

For the avoidance of doubt running ISRs out of flash is death, just that you might fluke death for a short period. Don't access flash at all, IMO.

TerryE commented 8 years ago

I'm less happy about the priorities. The writer of the GPIO module cannot know what the application developer is going to do with the lua gpio callback. It might take hundreds of ms to execute.

Philip @pjsg, I've been mulling this one over. In general GPIO triggers are time-critical so it makes sense to deliver then at high priority. If the programmer wants to drop to a background task then we can do this. I still have to retro-engineer the relative priorities of the other callbacks, such as timer ones.

pjsg commented 8 years ago

@TerryE When you call the registered task callback, could you please pass in a valid lua_State pointer. I'm now getting nervous about the modules that save a copy of the lua_State pointer and expect it to remain valid forever. In particular, the GC can kill threads (aka lua_State objects) if there are no lua references to it left.

The task system might have its own thread, or it might share with the initial thread (which probably cannot go away...)

jmattsson commented 8 years ago

Any reason why you can't simply lua_getstate()? The task system itself doesn't (and imo shouldn't) know anything about Lua-land.

pjsg commented 8 years ago

Absolutely nothing -- I had forgotten about that call (and I was looking in the main lua docs...)

I'll crawl back into my hole for a while....

nickandrew commented 8 years ago

Putting my 2c in here, I want my C code to be able to register a C callback function which will be executed as part of the ISR, not in a later task. The callback function will of course run short and sweet, and not call into Lua.

My device driver currently does this:

  gpio_set_callback(ir.pin, ir_gpio_cb, (void *) &ir);
  platform_gpio_intr_init(ir.pin, GPIO_PIN_INTR_NEGEDGE);

i.e. "Register ISR function ir_gpio_cb() (ICACHE_RAM_ATTR is required) which will be called with argument &ir on negative edge of pin ir.pin". Those two calls could better be combined into one. So I'm just after a similar simple interface to Terry's work.

My driver calls into Lua from a timer callback which fires after there's been no IR data for a few ms. It delivers a series of complete received bytes (usually 4 at a time) to Lua. It works splendidly, and because Lua's only called at the end of a sequence, there's no time-sensitive Lua code.

pjsg commented 8 years ago

Curiously enough, the reason that I volunteered to redo the gpio module in the context of the task stuff, was that I also wanted to be able to have a gpio isr hook. I think that this is independent of the task stuff.

TerryE commented 8 years ago

PR submitted