nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32
https://nodemcu.readthedocs.io
MIT License
7.64k stars 3.12k forks source link

Policy of supporting Lua in ROM #2068

Closed TerryE closed 6 years ago

TerryE commented 7 years ago

Although this issue links to earlier discussions in #1289 and #1661, I see this as a policy issue mainly for the committers, so can you all read this and give your comments so we can move forward on the basis of some form of consensus?

Of the ~45Kb RAM available on the ESP8266, typically half or more of this RAM is Lua compiled code and constant data as opposed to true R/W data. The facility to move Lua binary code in to Flash will more than double the effective RAM available to programmers.

Do we add support for running Lua directly out of Flash?

If so do we add it to the current dev branch soon?

Background

A hierarchy of function prototypes. and their associated vectors (constants, instructions, meta data for debug) are loaded into RAM when any Lua source or lc file is loaded into memory. Because in the Lua architecture, each Proto hierarchy can be bound to multiple closures (this closure creation is only done by executing the CLOSURE statement at runtime), such hierarchies are intrinsically read-only and therefore in principle ROMable.

The main complication here is that, like all other Lua resources, Proto hierarchies are garbage-collectable (and advanced Lua programmers exploit this collection). So IMO, the difficulties arise when devising the details of how any compiled Lua in ROM interacts nicely and stably with the GC: it's fairly straightforward to implement a scheme which work mostly: but we need one which works all of the time in a well determined manner if we proceed with this.

I haven't worked out a robust way of doing an incremental storage system, as Phil discusses in #128, and IMO this will be hard to realise. What I have worked out how to do is essentially an "freeze into flash, then reboot" approach.

Basic approach

This process is simple and robust, but the Lua RTS is built around the assumption that collectable objects don't move their location and that strings are interned. It will be impossible to return control to the invoking Lua after a successful load, and difficult to return control after a failed one, which is why this "reload flash and immediately reboot" option is the most robust.

This system would enable Lua programmers to be able to compile and execute significantly larger Lua programs within the ESP resources.

There are some extra wrinkles for the Lua 5.3 environment but I will park these for now. So comments so far?

TerryE commented 6 years ago

I will give a example of the sort of wrinkle that I am debugging.

Lua stores strings in the strings table by a TString header immediately followed by the string literal. eLua introduces the concept of ROM string constants and if the string being interned is a ROM string then instead of storing the string itself, it appends a pointer to the string if the string is longer than 3 bytes. This saves ALIGN(strlen) - 4 bytes RAM for each string optimised this way, albit at the cost of extra testing on every string access.

There is a +/- debate as to whether this is worthwhile for LFS strings, since the string is already in Flash. OK it slightly improves flash usage, but at a cost of extra cache faults to access the string if needed: inline storage performs better.

However there is also an unintended side effect: the LFS store can persist between different S/W builds so long as the size doesn't move the page boundaries, however all such eLua address indirections might well no longer point to the correct strings in the new build: therefore you must rebuild the flash after downloading a new build. Bugger.

Not an issue that an end user will face, but a bear trap during development. So do I:

At the moment, I am doing the last, but it did give rise to some head stratching until I realised what the issue was.

I have added/extended the following functions:

Another issue that I need to address are the Lua loaders used to require modules. NodeMCU keeps to the standard list (see loadlib.c) even though 3 aren't used / don't work:

  1. loader_preload. This looks up the module in package.preload and loads it. eLua/NodeMCU should use this to do its ROM modules resolution, but instead ignores this and has a patch in the require function itself to first check for the ROM table.
  2. loader_Lua. This is the standard Lua module loader which uses the package.path to search for the file corresponding to the module.
  3. loader_C. On std Lua, this uses the OS hooks to load a C shareable library dynamically using package.cpath . On NodeMC, dynamic overlays don't work so this code just fails.
  4. loader_Croot. A variant of (3). Ditto on the failure.

The require function tries each loader in turn and if successful, it then returns the Closure (a.k.a function in Lua). Incidentally, I use package.loaders[2] to load my Lua functions from SPIFFS as this handles the lc/lua precidence for me. The main advantage of this system is that both package.path and package.loaders can be updated by the application.

I think that the best approach is to replace (1) with a loader_flash and remove (3) and (4), so that require can transparently load modules either from flash or SPIFFS. Using separate loaders allows the application to configure the precedence. I can't change the ROM library lookup patch because this implements a non-standard behaviour of not adding the module to package.loaded and using a loader would add the table, and thus break backwards compatability.

Comments please.

I still have some odd gremlins with GC paths during the store rebuild that I am chasing down. Once I've doe this, I raise a PR.

jmattsson commented 6 years ago

Regarding the bear trap, I think it might be better to at least start off with having the strings inline. Unless I'm mistaken, you already have plenty of gremlins to deal with, so whacking one off the table for the time being would seem to be helpful. This is an optimisation we could look at including further down the track, when the dust has settled a bit.

As to the loaders, yes, what you suggest sounds sane and proper.

TerryE commented 6 years ago

This is a bit like praying in that I have a problem and putting it into words helps. The act of posting it here also helps, but I doubt that anyone will answer with a miracle.

The issue that I have is that I made some change and the GC is corrupting my RAM, and I am trying to diagnose the wheres and whys. So I am now in the remote debugger within the bowels of Lua LVM. The symptom is I've diagnosed so far is that the Lua Closure for init.lua is getting stomped on after the VM executes the first instruction in the code which is a GETGLOBAL 0 -1 ; node though this doesn't cause a exception until the next GETGLOBAL at instruction six.

Well not so much the Closure but the contents of it's env field gets overwritten. (The cl->env points to the Lua Table for its Lua environment which is the main global table _G at this stage.) It is not the pointer, but the actual Table structure contents. Working this all out is a pain because of largely undocumented remote debugger constraints:

Anyway, it look like one of the memory locations is getting stomped before my Flash code starts to run. Sounded like a double free or the like. So I moved a cut-down version of the debug realloc (from the Lua test suite) into the NodeMCU code in luaxlib.c This bookends each Lua memory block with a 0x55555555 marker and a check length. It issues a break 0,0 to throw you into the debugger if anything is wrong. The watch on this location when I add it isn't firing. Need to think about this, but I am tired and I have to get up in 6 hrs to catch a plane.

Even so, the extra level of diagnostics seems worthwhile for locating this sort of nasty problem. Literally, upwards and onwards.

TerryE commented 6 years ago

Sorry guys, I've been a bit distracted since I've got back to the UK. I'm commissioning some elements of the Home Automation system in our new build, and need to bring the heating and Direct H/W online -- both non-standard since the house is a passive house, plus lots of other jobs. So this work has been on the back-burner a bit.

As the the bug what is happening is that there seems to be some subtle interaction between the GC, the eLua emergency GC patches and the flash "don't try to mark ROM" ones so the white bit marker gets out of sync with the GC sweep and the GC starts to free resources still in use. I need to add a bit of instrumentation to work out what is failing and why.

dtran123 commented 6 years ago

No sweat :) We appreciate your work. This piece is a game changer for me...more RAM opens up new possibilities for sure and help with secured connections. Just putting a little note to show that we care about your efforts here. High on my radar is this (PR #2068) and PR #1707. Those are two key PRs to be resolved for me to invest more time on the ESP8266.

georeb commented 6 years ago

So, so great to hear that this is progressing!! Thank you very much @TerryE for your continued support!

We all completely understand Terry that you are under no obligation to meet deadlines or schedules and that this is a 'done in your spare time' kinda thing, however, for those of us that are eagerly awaiting a dev version of this, do you have a rough estimate of when you think you'll be able to release something? Not looking for a commitment in any shape or form, just a realistic prediction of when we are likely to be able to start using NodeMCU again! :)

I understand you "don't need the dosh" but if a donation would sweeten the deal, then please let us all know. Unfortunately money is all I can offer, I wish it was technical support, but this is all a little beyond me!

Hope it's going well!

TerryE commented 6 years ago

No money. Just priorities, I'm sorry to say. I am up to my eyeballs in Lua and Node Red commissioning my home automation system for my new house. I'll take a break soon and spend a half a day getting to the bottom of this GC issue.

As soon as I have a stable build, I will push a commit to my github fork.

georeb commented 6 years ago

Hi @TerryE - Any update on this please?

I am desperate to get my hands on a version of NodeMCU that allows me to comfortably connect securely and also have enough heap for other stuff too!

Again, I know you have no obligation here, but do you have a rough idea of when you'll be able to find some time to complete this?

I am currently looking to invest in a developer to get a usable version up and running and wanted to see what stage you were at first, before I engage them...?

Many thanks :)

TerryE commented 6 years ago

Hi, @georeb. I've just moved into the house that I've been building for the last few years and am typing this in my office on the first floor (in England the ground floor = zeroth). I hope to work on this over the holiday break and get a version out for evaluation. As to your investing in a developer to do this, my advice is: don't bother. This is complex stuff because you've got standard Lua, the eLua hacks, and all of the ESP issues interplaying. The learning curve is huge.

georeb commented 6 years ago

Any progress over the Christmas break @TerryE ?

I understand that it'll be a learning curve employing a developer, but I don't have much choice. You are unfortunately the bottleneck and as I cannot interest you in payment, I have to pay someone that will. As always, I understand that you have no obligation; however I (and others) have been waiting 5 months for this now and I have to do something before it all gets superseded by something else :/

If others want to chip in to help out with developer costs, please get in touch.

TerryE commented 6 years ago

@georeb, we sold our old house and moved into the one we built on the 19th Dec and I've just been getting the HA system to the point where it will run the house's heating and environmental controls. After working 7 days a week on the new build, we've got to the point where my wife and I both have time available for our interests.

By all means employ a programmer to do this work, but don't underestimate the learning curve. You will probably waste your money as I will beat her or him to this deliverable.

georeb commented 6 years ago

Congrats on moving into your new build! :) How far off the deliverable would you say you are @TerryE ?

georeb commented 6 years ago

Any update please? @TerryE

TerryE commented 6 years ago

Yup. Long story, short. I ran into a bit of show stopper that has forced my to change my implementation strategy. The problem wasn't that the approach doesn't work but more of a scaling issue because of how the GC (and the EGC modifications) interact with the build process. The EGC includes extra GC pause / restart directive around some operations and nested pause / restarts aren't honoured, so these would always restart the GC. The Lua GC will aggressively scan all collectables and mark any that aren't in Lua scope for GC.

What this means is that the build process doesn't scale robustly, and above a certain size of flash image the GC could come in and collect elements that I was assembling for the flash. Getting around this by referencing them was creating extra overheads which hit the scaling issue even more. Either that or start making fundamental changes to the GC, which I just am not willing to do.

So this issue was about robustly building a flash image on device, rather than executing it on the node once built.

My alternative approach is to move the flash image building into the luac.cross build, so that doing a cross luac on a collection of files with the correct switch builds a PIC flash image, which you can copy into the SPIFFS on the target and execute a single API call to reload the LFS with this image and restart the node.

The only complication is that the host environment must be a little endian architecture such as Intel or ARM, but the code has to cope with 32bit and 64 host environments.

I am junking the current eLua-based cross-lua.lua approach, and the standard make now builds luac.cross as well ( as I do with 5.3) so long as the host includes the standard build-essential toolchain.

The luac side is working. I am in the middle of stripping out the rebuild stuff from the ESP end and adding the small PIC loader, Another few days of dev work.

georeb commented 6 years ago

You're right, sounds extremely complicated @TerryE !! So, this is sort of good news then? It's hard to tell, not being technical!

Are we close?! :)

TerryE commented 6 years ago

OK, It looks as if I have ironed out most of the issues and can put together an evaluation PR. I just need to check that my build without all of the debug hooks works as anticipated. We will clearly need a tweak of the API stuff and I still have some bits to add. But the highlight so far are:

There's stlill a TODO list, for example:

I've just been playing with a test LFS which has 7 function files loaded, has 135 string constants in the ROM table, 22 are in the RAM string table and there is over 39Kb heap still available for the App, so this is all looking promising.

I've also fixed a bug in the remote debugger and become adept at using this. I've also added some gdb macros which will help library developers examine the Lua stack, and I need to write all of this up in the developer guide sometime.

pjsg commented 6 years ago

How does the following get built?

local M = {}
M.add1 = function(x) return x + 1 end
return M

I'm hoping that it will be possible to write a wrapper for require that searches the file system first, and it not found, uses the version in the node.flash. (The rationale for that order is that it allows easy development by only having to upload the one file that you want to change)

TerryE commented 6 years ago

I'm hoping that it will be possible to write a wrapper for require

Phillip, there's no need. Read up on package.loaders. The require loader passes the module name to each in turn, and this handler then either

The package.loaders table is in RAM so the application can reorder the handlers or replace/add one. (search for lua_CFunction loaders in app/lua/loadlib.c). We only use the second, loader_Lua in NodeMCU, so you can replace any of the other 3 with your own Lua function:

local index = node.flash.index  
local function loader_flash(module)
  local r = node.flash.index(module)
  return type(r) == 'function' and r -- or nil otherwise
end
if index then package.loaders[2] = loader_flash end

If you have some init module in flash then you can stick this fragment in it, then the only RAM overhead is the loader_flash LClosurewith its one upval.

As far as how it gets build, you can either just stick the modules in fs/lua and do a make, or you can do your own process. I am going to update my own provisioning system to be LFS aware, so this will all be seamless for me.

TerryE commented 6 years ago

Another trick is that I include a dummy module preload which is just a single lua line:

-- preload a bunch of strings into the ROstrt and avoid the RAM overhead.
-- use debug.getstrings('RAM') to work out which you might want to add 
-- for your application
local preload = "?.lc;?.lua", "@init.lua" -- , ... extend as you need

or add more preload = .... if you have lots of string that you want to preload into ROM. This creates a dummy module with just a load of LOADK instructions and a constant list of all of these strings, which luac.cross will then preload in the ROstrt, so you won't chew up your RAMstrt and have all of the associated GC overhead. You never need to call this; just including it in the compile is enough.

OK you are wasting n × (TValue + Instruction) in the LFS to do this, but with up to 256Kb available and it never being called, do you care?

I was thinking about reverse engineering the compiler to preload all of the common strings used during compilation to drops the compilation overhead.

pjsg commented 6 years ago

@TerryE Makes sense. Looking forward to seeing this in action!

TerryE commented 6 years ago

Incidentally one of the best tricks to do with the debugger is to add a macro for lua_assert which does a debugger break and then enable this for your test code. The Lua API macros use lua_assert a lot to do validation so this will pick up a lot of consistence errors. You can also make heavy use of lua_assert in your own code. If not enabled then this all gets optimised away / removed by the GCC code generator at -O2. The real PITA with using the debugger is that you loose the ability to input strings through the UART input, so you need to use a telnet stub for interactive testing.

I am thinking of having a variant assert stub which puts out a warning message to come out of your UART terminal session and start xtensa-lx106-elf-gdb before itself starting the GDB remote stub then issuing a break so that the host and target can rendezvous in a debug session, and this way you get the best of both interactive and debug use.

georeb commented 6 years ago

This is GREAT news @TerryE !! :) Thankyou. Is the plan to release a DEV version that will eventually be merged with the MASTER branch?

TerryE commented 6 years ago

The Alpha version will stay in my fork until at least one other committer has checked it out. Then it will be pulled into dev. It will go into master on the following release cycle, but with the LUA_FLASH_STORE define in user_config.h commented out so that builds won't have LFS enabled by default. However individual developers will be able to enable it for their builds. We might subsequently switch it be default but that will be up to a consensus of the committers, not just me.

georeb commented 6 years ago

Excellent!

Will the version in your fork be an adapted version of NodeMCU MASTER branch? Sorry for the, perhaps, obvious questions.

TerryE commented 6 years ago

The way that the release cycle works is that we commit to dev, then batches of commits to dev once stable are then committed to Master. The only path to updating master is to move dev patches into it. So I am not sure what you mean by your repeated Q. There should be a master version with LFS support in the next 2-3 months, but the delay is only because of the dev to master promotion cycle.

About half the community use dev builds to take advantage of the latest bug fixes etc. The delay ensures that we have a reasonable chance to give good usage coverage to any changes before moving them into master.

georeb commented 6 years ago

Will the version in your fork be an adapted version of NodeMCU MASTER branch?

What I meant was, will your version be a standard copy of the current MASTER with the addition of LFS? I hope this is a little clearer?

nwf commented 6 years ago

@georeb Unlikely; it's more likely to be a fork of dev, rather than master, since that's the target for merge.

georeb commented 6 years ago

Okay, understood. Thanks

TerryE commented 6 years ago

I have just updated my Lua Flash Store (LFS) whitepaper so it now reflects the current LFS implementation. Anyone interested in this, please reread carefully. The LFS patch is so large that I have also had split it into 5 commits, each of which is larger than a typical PR here.

TerryE commented 6 years ago

For those who are wondering about my delays here, I find it quite time consuming to cover all of the base test cases and their variants: float vs Integer build; host (luac) vs target (lua) firmware; without LFS; with but no LFS used; with with LFS used. In my testing, I have come across a subtle architectural issue which related to my implementation of GC marking, and this really needed reworking before I release this.

We made quite a few compromises in getting the 0.9x versions of Lua out within the timescales that zeroday achieved. By now we have the luxury of a robust working 2.1 version. I don't want to compromise this by rushing out an LFS version too soon.

TerryE commented 6 years ago

See #2292 for further discussion.