nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.65k stars 3.12k forks source link

Emergency GC in Lua 5.3 #3152

Open TerryE opened 4 years ago

TerryE commented 4 years ago

Background

Lua51 includes the eLua emergency GC, plus the various rarely used EGC tuning parameters. The default setting is node.egc.ALWAYS which triggers a full GC before every memory allocation so the VM spends maybe 90% of its time doing full GC sweeps. Standard Lua 5.3 has adopted the eLua EGC but without the EGC tuning parameters. See #2783 and #3078 as well.

Proposed ECG implementation

NodeMCU will extend the Lua EGC with the functional equivalent of the Lua51 ON_MEM_LIMIT setting with a negative parameter, that is to trigger the EGC with less than a preset amount of free heap is left. Comparing heap needed to heap available is a very cheap operation as this value is maintained in a RAM memory control block by the SDK/ROM memory allocator and the system_get_free_heap_size() is just a getter to provide read access to this.

Adopting a lazy approach to GC results in the runtime spending far less time in the GC and a typical Lua application might run perhaps 5× faster. The risk here is that many libraries (e.g. LwIP and MBEDTLS) make direct calls to the SDK allocator and so if there is a burst allocation demand from one of these libraries then the application can fail with an E:M error where it would otherwise have succeeded is the LuaGC had run with an "always GC before allocation" strategy.

Tidying up this miscellany of ways of accessing the memory allocation is going to be a total pain and not easily resourced. However since we already use a define chain to map all of these variants back to the core pvPortAllocXXXX() there is nothing to stop us adding a separate allocator intercept so we can adopt a dual stage lazy GC strategy:

The potential issue with this is that a full GC sweep might get triggered during an allocation request in time sensitive code, but in general the various communications stack such as LwIP do the sensible thing and move any allocation requests forward of time critical code. Also the potential hiccup of a GC sweep is a far more preferable course than restarting after memory exhaustion.

TerryE commented 4 years ago

@nwf @HHHartmann @pjsg, you three are the most likely committers to eyeball this. Feedback and discusssion welcome. :smile:

nwf commented 4 years ago

ON_MEM_LIMIT is the default configuration for my NodeMCU projects these days. I think it makes sense as the platform default due to the much better performance it offers.

Unfortunately, LwIP and MbedTLS are, IMNSHO, poorly designed libraries, in that they allocate state internally rather than expecting the caller to provide it. (BearSSL is one of the very few that get this right, just to point out that it is possible to do something as complex as TLS this way.) The good news is that they tend to allocate up front, so if initialization or connection has succeeded, things are probably OK. A call to collectgarbage() at an opportune moment also probably goes a long way.

I'd like our C and APIs to be much more resilient to allocation failures than they are, so while I agree that instrumenting malloc with "on fail, call Lua GC and retry" is a sensible behaviour, I would prefer it be optional. Personally I'd rather it not be the default so robustness issues get shaken out faster, but I understand that this perspective is unlikely to make me many friends.

TerryE commented 4 years ago

@nwf, the catch-22 here is that running Lua applications generates a high churn of GCobject types including strings, tables, etc. and these need to be collected. Unlike a reference-count based system which can do this automatically and incrementally, the Lua GC is simpler and perhaps more robust, but the main runtime cost is having to sweep and mark all GCobjects to collect any. It pretty much costs the same to sweep and collect one object as sweep and collect 10.

The former is the default Lua5.1 behaviour and this means that we don't have stale but uncollected resources taking up some RAM, but the runtime is perhaps 10× slower than having this stale data uncollected. Having a few Kb uncollected data isn't an issue if we've still got 10+ Kb heap free, but it does become an issue as RAM gets tight.

Maybe we need to use the mode option to the lua53 version of lua_setegcmode() to toggle whether last chance GC is carried out or not.

pjsg commented 4 years ago

Do you think it would be possible to explicitly call collectgarbage in some of the paths that lead into LWiP/embedtls?

TerryE commented 4 years ago

Do you think it would be possible to explicitly call collectgarbage in some of the paths that lead into LWiP/embedtls? That's a simple addition to the net and tls modules.

Quite possibly, but another thing that I would to do first is to maintain some simple stats (possibly enabled by a #define setting: how often GC was called explicitly; how often it was called after the G(L)->gcmemfreeboard trigger and how often on malloc failure, plus average heap before and returned as a result of GC, and this will tell if we really need to do this.

TerryE commented 4 years ago

Note that this is only partly addressed in PR #3193, so I am leaving this open for now.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TerryE commented 3 years ago

Still active. @jmattsson, you might want to track this one.

jmattsson commented 3 years ago

Thanks, will do then :)

TerryE commented 3 years ago

@jmattsson I've just dumped a work in progress to Notes on the Lua Garbage Collector in the NodeMCU implemention explaining how the Lua GC actually works. I've still got to add a bit on the Emergency LGC and how we should trigger it on a free heap() threshold rather than allocation failure because the latter doesn't leave any freeboard for device driver and other service mallocs. This is about the only tweak needed, as far as I can see.

I've done some forensics on an instrument version with luac -e db.lua where this contains a one-liner debug.debug() to give me a NodeMCU command interface on my ChromeBook. Any on-ESP work with have to wait until I get back to the UK on Thursday.

@nwf, you just might be interested in having a browse as well.

I recommend that the setpause value should be 100, which runs the stepper continuously in the background and a setstepmul of 200, if we do a collectgarbage('step',4) say as the epilogue to any cb.

I've also got some more detailed notes on the internals of lgc.c but I am not sure whether anyone will read them.