Unified loading and saving code to RAM and LFS

nodemcu / nodemcu-firmware

Lua based interactive firmware for ESP8266, ESP8285 and ESP32

https://nodemcu.readthedocs.io

MIT License

7.67k stars 3.13k forks source link

Unified loading and saving code to RAM and LFS #2917

Open TerryE opened 5 years ago

TerryE commented 5 years ago

Missing feature

Currently developers must compile their Lua on a host PC in order to use our LFS functionality. This in turn requires them to be able to build the luac.compile executable, and this has been a major barrier for many Windows-based IoT developers to using LFS functionality. Such developers would find it a lot easier to use the NodeMCU environment if LFS could be directly loaded from SPIFFS without needing host compilation.

To do this we need a means of loading compiled Lua directly into LFS, and we need to mitigate some the scaling issues with current compilation owing to ESP RAM constraints.

Context

One of my two broad principles for the Lua53 port was that I should only change the Lua core when there is a compelling reason. The opportunities arising from the conclusion https://github.com/nodemcu/nodemcu-firmware/issues/2895#issuecomment-53223362 provides such a reason: we can unify the standard RAM load code / save functions with the equivalent LFS functionality into two common code dump and load functions that can be used to build LFS on-ESP.

In essence a Lua source file can contain a multiple Lua functions, any of which can also contain sub-functions, and this can all be loaded into a single unified Proto hierarchy. The current dump functionality walks this hierarchy top-down, but this in turn creates many forward references during the reload of serialised compiled code. Resolving such pending references is simple in directly addressable accessible RAM code structures, but this is a major complication for writing such structures to flash memory where it is best to write content sequentially.

However, it is a pretty straightforward change to traverse the Proto hierarchy bottom-up in order to dump this to a string or file. All of the hierarchical Proto relationships become backward references, and this means that we can load the Protos into flash memory as a serial chunk copy process.

Memory compactness was not a primary goal for the Lua designers, but it is an important goal for IoT environments.

The LCD patch was an important NodeMCU enhancement: this allowed developers to retain line numbering information for debug purposes with a small memory overhead.
Of course LFS enables developers to move their compiled code into flash memory, freeing up most the heap for true read/write variable data.
The current (standard) Lua save process as implemented by string.dump() is particularly memory intensive because it uses a luaL_buffer resource to return the entire source as a Lua string, and at peak this needs 3 copies of the function in RAM, and so this can only be used to dump source modules that compile less than 10Kb RAM or so. The node.compile() API avoids this limitation by sending the serialised code directly to file, but this lacks some strip options.

My proposal

We add an optional integer [strip] parameter to node.compile(), 0 = All line and variable info is retained; 1 = Variable info is removed, but the line info retained; 2 = All line and variable info is removed;
We add an extra file function file.dump(filename, func, [strip]) which dumps directly to the FS thus avoiding the need to use a luaL_Buffer to the intermediate Lua string form.
The func reference can either be a single function as with standard Lua, or an Lua array containing multiple functions. In this second case, if the elements positional then the function name is based on the chunk name; this can be overridden by using keyed entries and in this case the key is used as the function name.
lua_load() and the related load API calls can now process a multi function stream. Loading a single function will return a function variable as per standard Lua functionality; loading a multi-function stream will return a keyed array.
loadflash(pattern) will load a set of files with names conforming to pattern into LFS. The files must be Lua compiled files. The load requires three restarts of the processor, with the first restart immediately after the call.
- Pass 1 is a dry run pass which scans the files building the RO string table (ROstrt) parsing and doing a dummy load to flash to ensure that their are no load errors and entire set can fit into the configured LFS region. On error, the processor is restarted without updating LFS. If successful, then the accumulated RAM string table is written to the ROstrt in the LFS region.
- Pass 2 is reloads the source with all of the referenced strings now in the ROstrt. Each Proto is then written to LFS. At the end of the pass a ROTable function index is written to LFS and the processor is again restarted. The Lua VM will then renter the application with the new LFS loaded.

Discussion points and future enhancement

Moving to a unified loader increases the memory footprint of the load process and makes it impractical to use gzip compression owing to RAM constraints on the ESP8266. The actual dump format could be further compacted to reduce the LC file sizes, but I will defer this option to a future commit.
Advanced developers will still have the option of using a host-based development cycle using luac.cross to compile one or more LC files for downloading to the ESP then loading LFS from them.
The size of the LFS is primarily limited by the loader being able to fit the string table into RAM during the pass 1, however this will rarely be a practical constraint with 128Kb LFS regions, and again advanced developers will be able to avoid this constraint by directly flashing the LFS partition from the host using esptool.py or equivalent. (This will need a Python utility to convert from LC format to the internal LFS format.)
We should consider the option to flag loaded functions as execute only in the LFS load sequence (akin to dofile()). The reason for this is that GC of the string table is disabled during Pass 1 so any strings referenced in such a dofile will be added to the ROstrt without needing to store any dummy Proto.

HHHartmann commented 5 years ago

@TerryE Terry I think that this solution makes a lot of sense. More sense than allowing host based standard compilation. I am not sure if I understand all the details. Especially the last paragraph.

I think it might also be useful to have the loadflash(pattern) with as many patterns or filenames as needed to also be able to flash subsets.

If I understand it correctly the host based luac.cross would then generate lc files containung a multi-function stream. It would already contain a dummy proto for the strings then. Or is it an own fileformat that still could be compressed?

It might also make sense to be able to flash incrementally, newer versions hiding the old ones, so in development not the whole LFS would need to be flashed on every update. That should be possible by chaining the ROstrt and the ROTable function index. It wouldn't hurt if then unused strings of earlier iterations are still available. This would also circumvent problems with large string tables. I See this would have a negative impact on runtime but for development phase that seems fair.

When the LFS partition runs out of space, a complete flashing would have to be performed by the user (respectively her code)

TerryE commented 5 years ago

Gregor, thanks for the feedback. Much appreciated

I think it might also be useful to have the loadflash(pattern) with as many patterns or filenames as needed to also be able to flash subsets.

We can always extend this if needed, but I want to keep it straightforward for the first implementation. We need to be able to load multiple LC files into a given LFS make, but being able to specify a pattern like "config1/[^%.]*.lc" so give us enough flexibility. This string has to be passed between reboots so I want to keep it short and simple.

It might also make sense to be able to flash incrementally, newer versions hiding the old ones, so in development not the whole LFS would need to be flashed on every update. That should be possible by chaining the ROstrt and the ROTable function index. It wouldn't hurt if then unused strings of earlier iterations are still available. This would also circumvent problems with large string tables.

I See this would have a negative impact on runtime but for development phase that seems fair.

Doing a fully reload is only going to take seconds anyway, so there is no point in introducing all of this complexity and having runtime hit of doing this sort of chaining. So no, sorry. I estimate that the largest practical ROstrt will be about 32Kb and this should easily be enough for this target audience. Advanced developers will away be able to do a nodemcu-partition.py -lf someHostFile.c and not have to live within this constraint. Also note that there is nothing to stop you doing file.dump('config1/funcA.lc', LFS.funcA). We could also having a Lua module which does a basic make process, so this plus ftpserver and telnet server would be a good basis for most developers.

If I understand it correctly the host based luac.cross would then generate lc files containing a multi-function stream. It would already contain a dummy proto for the strings then. Or is it an own file format that still could be compressed?

No luac.cross and file.dump() will generate the same code formats. The only reasons to do host-based compilation is if you use a host based build process. To be honest, that's what I do; I find using tools like ESPlorer really clunky. I do all my editing on my laptop, then use nodemcu-partition.py to reload the LFS. This is all scripted in a little batch file. If you need faster than the few seconds this takes then there is nothing to stop you having a make file which only FTPs down the changed LC files.

There isn't a separate string table per se. On the first pass the loader starts with the old LFS disabled, an empty RAM strt and string GC disabled. If then scans all of LC files in the pattern and by the end the RAM strt contains all of the strings used in the loaded files. This is then copied to the LFS so that on the second pass all of the required strings are already in the ROstrt, and all of RAM is available for compilation.

The size limit is because the strt must fit in RAM and still leave enough heap for the dummy compile process.

I am not sure if I understand all the details. Especially the last paragraph.

We could use the convention making the extension .lx for dummy source files, so including dummy_strings.lx in the load pattern would result in all of its strings being included the ROstrt without a module LFS.dummy_strings() being created in LFS. But note that there is a lot less need for this file anyway as most of these strings are simply the ones created by the Lua VM and opening all of the active modules. Since this is done anyway on Pass1, all of these strings will get added to the ROstrt anyway.

TerryE commented 5 years ago

Gregor, A codicil to your point about compressing image and lc files, the new LC format does do some compression of the LC file format. Take the stats for the api.lua test suite file which comprises 1172 source lines:

Size	Description
32,339	Original source
16,467	Lua 5.1 compressed LFS image containing this file
54,252	Uncompressed version of the same
67,501	Standard Lua 5.3 LC file
35,544	NodeMCU Lua 5.3 LC file

So whilst the broadly the old LFS image file was roughly 4x smaller than the standard Lua LC format, and the new LC is midway at roughly 2x smaller. However:

The new format allows on-ESP development cycles and creation of LFS images
You don't need to aggregate all files into a single image, but can load a bunch of LC file into LFS, so if you are only changing say 10% of your source files and do smart download and incremental compilation then you will still end up downloading less bytes to the ESP per development cycle. 35,544 | NodeMCU Lua 5.3 LC file

HHHartmann commented 5 years ago

@TerryE Terry you are right that flashing incremental LFS would most likely be overkill. As to the compression of the files, I never measured how long a download takes, but I think at the expected sizes it really does not matter that much.

What I am not sure of is the process to get host built LFS on the nodemcu. Is it like now: Build with luac.cross of nodemcu, download .img file to SPIFFS and the node.flashreload it in one pass as now? Or is that the case where I have a multi-function stream .lc file?

Thanks for your continuous effort for this project.

TerryE commented 5 years ago

Gregor,

If you look at the uzlib source code, what really makes this powerful is the combination of a dictionary-based substitution and the huffman coding of the sequences as described in RFC 1951, but the killer is need for the sliding dictionary window being in RAM. The RFC requires 32Kb, but I've already dropped this to 16Kb so that the reflash algo could still run in available ESP8266 heap.

Just out of interest I temporarily set the dictionary length to 16K, 8K, 4K and 2K and ran this against the two variants of the dump/undump algo and got these results:

Compression window	None	16384	8192	4096	2048
Standard encoding	39077	18316	18314	18508	19012
modified encoding	35544	17479	17479	17620	18009
Std enc(%)	100.0%	%46.9	%46.9	%47.4	%48.7
Mod enc(%)	91.0%	%44.7	%44.7	%45.1	%46.1

There is a one line change here) which I wrapped in a 3-line bash script to produce these different file sizes. The standard vs modified encoding refers to how fields such as integers are dumped in ldump.c

The Lua standard algo uses a type byte followed by 4 bytes of the integer.
My modified algo uses a variable length scheme where integers up to ±15 are encoded in 1 byte, upto ±2047 in two etc. Ditto for sizes but ±127 are encoded in 1 byte, etc.

As you can see this modified algo is worth doing if the stream is uncompressed as this reduces the typical LC by around 10% in size. But your Q about compression got me wondering and hence this experiment. Refering back to my principle "only change the Lua core when there is a compelling reason", what this test shows is that:

There is remarkably little extra benefit from increasing the window over 2Kb. LZW compression even with this window reduces the LC file sizes by roughly 2× and so is definitely worth doing.
- My modified algo only yields an extra few % on top of this and so this is not worth changing the source to achieve this. Time to revert these :frowning_face: Incidentally the reason for this compound saving being 2 and not 10% is that compressing an already compressed data stream often results in counter intuitive results.

So your Q has ended up halving LC file sizes :+1: Thanks.

What I am not sure of is the process to get host built LFS on the nodemcu. Is it like now: Build with luac.cross of nodemcu, download .img file to SPIFFS and the node.flashreload it in one pass as now? Or is that the case where I have a multi-function stream .lc file?

Either or both. The developer is free to choose as the technology will enable both, and they have their respective advantages and disadvantages. For those developers compiling and building on the ESP, then they will probably just run an FTP server and keep their Lua source files on SPIFFS and do their build process on the local FS. As I said previously it would be easy to develop a standard Lua module which wraps this as simple make process, e.g. by keeping a list of lua files + MD5s as a JSON string in a make config file. The script would run over the file list and recompile any changed files; if any have changed then the LFS would be reflashed. An exercise for a volunteer.

Ditto if the developer is doing host-based development: some developers might just prefer the simplicity of rolling up everything into a single LC file; others might prefer to use a per file make. I myself would probably KISS and go the single LC file route, but let us see.

TerryE commented 5 years ago

As well as the file.dump() variant of string.dump(), it is probably worth adding this 2K window deflate / inflate to file as well: file.getpacked() and file.putpacked() and perhaps string.getpacked(). A 2Kb window packed stream is readable as a gzip compressed stream and this this would be a good addition for HTML applications. But this is a job for another day, I think.

HHHartmann commented 5 years ago

Terry,

impressive that the buffer size has so little influence to the compression ratio. Rates might differ for HTML and CSS, but I guess the direction stays the same. So a +1 for compressing on chip. Would be useful to have a streaming compressor, so piping dynamic content would be possible. All static content probably should be compressed on the host already.

The standard vs modified encoding refers to how fields such as integers are dumped in ldump.c

The Lua standard algo uses a type byte followed by 4 bytes of the integer.

My modified algo uses a variable length scheme where integers up to ±15 are encoded in 1 byte, upto ±2047 in two etc. Ditto for sizes but ±127 are encoded in 1 byte, etc.

As you can see this modified algo is worth doing if the stream is uncompressed as this reduces the typical LC by around 10% in size. But your Q about compression got me wondering and hence this experiment. Refering back to my principle "only change the Lua core when there is a compelling reason", what this test shows is that:

There is remarkably little extra benefit from increasing the window over 2Kb. LZW compression even with this window reduces the LC file sizes by roughly 2× and so is definitely worth doing.

My modified algo only yields an extra few % on top of this and so this is not worth changing the source to achieve this. Time to revert these ☹️ Incidentally the reason for this compound saving being 2 and not 10% is that compressing an already compressed data stream often results in counter intuitive results.

True, but it's not all about lc filesize and compressing them. The 10% for smart integer encoding would still help reducing the space needed in the LFS partition or am I missing something. So I wouldn't easily drop it.

Ditto if the developer is doing host-based development: some developers might just prefer the simplicity of rolling up everything into a single LC file; others might prefer to use a per file make. I myself would probably KISS and go the single LC file route, but let us see.

Atm my KISS would be: I have a running system, why change it :-) Nah actually it is really bad, so I will have to change it.

TerryE commented 5 years ago

The 10% for smart integer encoding would still help reducing the space needed in the LFS partition or am I missing something.

Yup: missing something. The data structures for compiled Lua code in memory have a well defined format and structure that can be walked:

Lua constants are stored as TValues. In NodeMCU 1.x these were 16 bytes for Float builds and 8 for Int build. These are 8 bytes new Lua53 builds and these can store both integer and float subtypes.
TStrings are 12+M bytes
Code is a vector of 4×N bytes.
Debug line info is ~L bytes where L is the number of non-blank source lines.
Any sizes, counts or address references are 4 bytes.
Small count and flag fields are 1 byte.

These sizes are determined by the worst case capacity of each type and the CPU architecture, so there is little we can do to change this without hitting runtime speeds, but I have done where there is a compelling case: for example line number info is only uses for error reporting so the LCD patch took the hit of an O(N) lookup during error reporting to drop this from the previous 4×N bytes and this cut overall code size by around 30-40%. Likewise moving code into LFS frees up RAM and ditto dropping the TValue size from 16 to 8 bytes;

The compiled LC format on-file system format is really just a serialised lossless format of this in-memory hierarchy. ldump.c and lundump.c in many ways just parallel sjson.encode() and sjson.decode(). Mapping the Lua integer -2 to 0x16 rather than 0x13 0xfe 0xff 0xff 0xff saves space in the encoded form but makes no difference to the LFS sizing.

TerryE commented 5 years ago

I need to do bit more thinking about compression options. As well as input and output buffers, the LZW algo requires a dictionary window (N bytes), chain index (2N bytes) and a hash vector(~0.5N bytes) so if we use 256 byte input and output buffers with a 2Kb dictionary, then we need 9 Kb free heap to do this on-the-fly compression. Easy on the ESP32; doable on the ESP8266, but only in the right circumstances.

So for now I will continue to focus on getting LFS working without compression.

HHHartmann commented 5 years ago

So for now I will continue to focus on getting LFS working without compression.

that sounds reasonable, as having LFS is a much greater improvement than compression, which might well come later.

Just one more question: The flash process on chip will then always be two pass, limiting the amount of strings to the said 32 or so KB? If I need more, I have to flash directly via esptool. Or will there still be the old mechanism of out luac.cross and one img file which can be node.flashreload() ed?

TerryE commented 5 years ago

Just one more question: The flash process on chip will then always be two pass, limiting the amount of strings to the said 32 or so KB?

We could always find more elaborate algorithms to try to work around RAM limited, but the effort involved will raise with each step in complexity. My instinct is to defer any such steps away from KISS until there is a compelling reason. We need to confirm this with hards, stats but IMO most compiled apps simply don't use that much string space for this to become a realistic application constraint.

Yes, we could consider regressing the Lua51 flash.c and fashimg.c into Lua53, but I am loath to get distracted by this complexity for a hypothetical what-if.

TerryE commented 5 years ago

Standard Lua has three encoding formats for Lua code: source file (lua), compiled file (lc), and in memory (RAM), and given that we have no decompiling options, here are the valid combinations:

Conversion	Mechanism
lua → RAM	compile = parser + code generator.
RAM → lc	dump
lc → RAM	undump
lua → lc	compile + dump

LFS is an additional memory variant.

For read access, the LFS region is mapped into the program address space and is directly addressable in the same way as RAM and hence the undump functionality is the some for both RAM and LFS.
However you can't use the same code to write to RAM and LFS, in that writing to LFS is more like record I/O to a file; there are also cache coherence coherence issues with write followed by read. A corollary of fille-like output is that intermediates need to be buffered in RAM and this buffering introduces additional constraints if we want the LFS region to be larger than RAM.

Hence we need to add various undump and copy conversions to support LFS. In Lua51, I kept these extras to a minimum by only supporting an extended luac.cross for creating an LFS image file and having a image loader for the firmware.

I feel that there is a consensus that there is a significant demand amongst many Win10-based IoT developers to do ESP-based compilation and LFS building. This is entirely doable for Lua53, but my concern is how to avoid the explosing of conversion mechanisms that need to be coded and maintained; how to provide these usecase variants whilst trying to keep to the orthogonal principles of the Lua implementation which keep the core VM compact and fast.

jmd13391 commented 5 years ago

I feel that there is a consensus that there is a significant demand amongst many Win10-based IoT developers to do ESP-based compilation and LFS building.

My 2-cents for what it's worth: As a long-time user of NodeMCU Lua on the Win## platform (and someone who has produced several marketable products using it), After trying several build paths, I have to say that using the Docker+LFS+esptool build path for Win## development is the way to go (if you insist on doing it under Windows). Like most newbies to NodeMCU Lua, I started my journey using @marcelstoer 's Cloud Builder service and found it to be extremely user-friendly. It provided the appropriate amount of build mechanics masking that allowed me to focus on application layer development and not how I get my application built & loaded into the MCU. When I outgrew the Cloud Builder service, it was @marcelstoer 's NodeMCU Docker (and eventually @TerryE 's LFS) all the way. I never looked back.

I have to honestly say that the most confusing, frustrating, and time-consuming part of my NodeMCU Lua journey has been trying to navigate through the multitude of conversion mechanisms and use case variants. Simply put I feel it puts a "too confusing / too complicated" label on NodeMCU Lua that drives away newbies from using it.

IMO, You should KISS it and stop trying to please everyone (a mythological state that is not achievable).

It is not worth...

trying to keep to the orthogonal principles of the Lua implementation which keep the core VM compact and fast

...at the cost of adding even more complexity by way of additional conversion mechanisms and use case variants. I, for one, would welcome a reduction of them. Maybe sanitize it all down in a future master drop to:

Cloud Builder on Host Docker on Host Linux on Host

Where in ALL cases it's "on Host", drop all the "load from SPIFFS" stuff, and LFS goes from being an optional build path to the singular, native build path. Of course, that would also simplify base NodeMCU Lua "User Manual" by moving all the reams of "How LFS works/is different than..." documentation out to addendums and whitepapers... and THAT would be extremely attractive to newbies. It would go a long way to increasing the user/contributor population.

TerryE commented 5 years ago

@jmd13391 Joe, thank-you for your input.

I feel that you have already made the transition to advanced developer. What we need is a set of fairly pain-free and easy to follow steps for those who are new to IoTs to get to your level of competence. I can do the development, but I am not the right person to lead the communication to those new to IoT today largely because I've got decades of experience and it's also been over a decade since I last used a WinX machine in earnest so I don't have a shared starting point.

This is a community effort and we need enough engagement from the committers and other advanced community members to gain a good understanding of what the right consensus should be. For example you suggestion culling of the detail on "How LFS works" into an addendum, which is my inclination as well, but these reams got lifted out of the original whitepaper and embellished at repeated user request Someone from the community should lead this, IMO, but unfortunately we don't have any volunteers willing to put in the effort, and without that leader what we get is a bit of a drunkard's walk.

IMO, If you guys advise that Docker on Windows is the way to go then what we could really do with having maybe 3-4 off 10min YouTube tutorials to take newbies through the startup steps; something like Setting Up Docker on Windows but up to date.

HHHartmann commented 5 years ago

At first I thought it was a bad idea to switch away from the current mechanism of building an LFS image. Now I see the only problem in the limit of around 32 K of string space in the new method.

Lets see the pros and cons a bit (please correct/add further points if I missed something).

Feature	LFS	*.lc
build on host	luac.cross	luac
build programm binary available for download	No	Yes
binary for native Windows available	Yes	Yes
build on ESP	No	Yes (smaller files only)
online build service	Yes	No (not yet, not really needed)
Ability to keep base modules separate from application	No (always have to build complete LFS.img)	Yes (can stay unchanged as *.lc on device)
ROstrt limit	LFS size	~32KByte
Files needed to be downlaoded	1	1 or several
Learn new process	No	Yes

So there are some points against the new *.lc method, but all in all I would prefer it over LFS.

I don't think that it makes sense to support both but I am not sure about the dependencies on Lua 5.1 and 5.3.

If it is easier I could imagine to leave the old method for Lua 5.1 and implement the new methode for Lua 5.3. But that is only if it is easier to do.

As an early adopter of LFS I was not confronted with the many options to do teh various steps, but I also think that it is way to complicated to get started. It should be documented somewhere, but I fees that a step by step tutorial for a "Hello World" would help many to get started.

TerryE commented 5 years ago

Now I see the only problem in the limit of around 32 K of string space in the new method.

To be honest, I agree with Joe: from the PoV of me as an advanced developer, editing code on SPIFFS and building on the ESP is a pain in the a**e. It is slow and you always risk bumping against RAM constraints during the build. When I am developing I just keep all of my source files in host directory trees under git, and I have a small shell script which uses luac.cross and nodemcu-partition.py to reimage LFS. This compile and download cycle takes seconds.

I personally view on-ESP building on a par with using WinXX as a dev environment: some people might want to do this; for the life of me, I can't understand why on earth anyone would want to, though I accept their right to think this way.

build program binary available for download

What do you mean by this.

Ability to keep base modules separate from application (LFS=no)

I don't understand why you reach this conclusion. Developers can organise their host file hierarchy any way they want. I certainly use symlinks and separate folders to maintain the sort of separation that you imply. OK, everything gets combined into a single LFS image, but this only takes seconds to build and to download, so this is hardly a material constraint.

BTW, I have come to the conclusion that a worrying % of new developers seem incapable of reading through mutli-page documentation these days, though they might be willing to watch 20-30 mins of YouTube tutorial is split into chunks of no more than 10 mins.

TerryE commented 5 years ago

I've just got off Skype and a 2-hr catch-up call with Johny reviewing the Lua 5.3 progress. The main bullets w.r.t this issue were:

The Lua 53 build is at a usable Alpha stage.
The main functional hole is the LFS implementation.
At a minimum, the LFS feature set should match that currently offered with the Lua 5.1 version.
The LFS implementation should allow / facilitate the addition on on-ESP LFS creation. However, the immediate priority for testing purposes is the current feature set.

HHHartmann commented 5 years ago

build program binary available for download

What do you mean by this.

Currently you cannot download a copy of luac.cross for windows (maybe I missed other platforms though)

Ability to keep base modules separate from application (LFS=no)

I don't understand why you reach this conclusion. Developers can organise their host file hierarchy any way they want. I certainly use symlinks and separate folders to maintain the sort of separation that you imply. OK, everything gets combined into a single LFS image, but this only takes seconds to build and to download, so this is hardly a material constraint.

Yes It only takes seconds but if I can leave a base set of functionality on the chip which is proven to work, maybe including configuration, it would be easier than having to create an individual LFS image for each device. But it works almost as well with the existing process and there are easy workarounds (like separate Lua or json file for configuration). So which ever way you want to go is great. As long as it supports Lua execution from flash.

BTW, I have come to the conclusion that a worrying % of new developers seem incapable of reading through mutli-page documentation these days, though they might be willing to watch 20-30 mins of YouTube tutorial if split into chunks of no more than 10 mins.

Grin and agreed.

TerryE commented 5 years ago

Just a quick update on what is happening here. Having bounced around options with Gregor and Johny, the scope of Lua 5.3 version 0 of LFS is now getting stable. Most of the extra functionality has been added to the ldump.c and lundump.c modules. LFS image and LC file share a common structure, albeit with some variations. In the LFS image:

It is a bit like a TAR file in that it contains multiple Proto hierarchies in a single file rather than multiple files with one Proto hierarchy each.
The dump function prefixes these hierarchies by a string table dump, which is an array of all of the TStrings used in the Proto hierarchies. Loading this is serial to LFS and so isn't RAM constrained.
The TString constants in the Protos are an index into this array rather than a string value.

The token hierarchy in this file is well determined with no forward references. Hence it can be loaded serially into LFS in a single pass. The format also works equally well within luac.cross, and so I am adding a -u option to preload an LFS image that can be accessed in the -e environment. The main advantage of this is that I can develop and test LFS functionality on a dev host under gdb which makes development a lot easier.

The first commit will not have compression, but this is pretty straightforward to add.

I will subsequently add an ESP based string table dumper, some parallel to node.compile() which can produce a string table LS file for a given wildcard pattern of LC files. Note that since this isn't overwriting the LFS during this process, it can use any existing strings in LFS to avoid RAM storage and can thus build larger string tables than previously discussed. But this is a 3rd or later commit.

I am a lot more comfortable with how it is shaping up.

HHHartmann commented 5 years ago

Terry that sounds great, especially the part of building the LS file incrementally, which allows for (almost) arbitrary large string tables if done iteratively.

So luac.cross will generate a version 3 LFS file then? but that's well worth the benefit.

TerryE commented 5 years ago

especially the part of building the LS file incrementally

Not quite true. What this putative node.compilestrings(LuaPattern,"index.ls") does is to create an LS file containing all of the strings needed to load all of the files in LuaPattern into LFS. This scans all of the files in file.list(LuaPattern) and builds up a table of {"string"=index} and then its inverse: {[index] ="string"} and both of these tables need to fit into the heap.

However, if say 90% of the strings are in LFS, then the node and array which are 20 and 8 bytes per entry will be the biggest heap chunk which means that we could assemble up around 1K unique strings within RAM constraints. This is still enough for a usable LFS, but a lot less than a luac.cross compiled image.

Stripping the debug variable name info will help a lot with scaling in that without this all variable names are included in the list of strings, but at a cost of obfuscating errors a little.

TerryE commented 5 years ago

Getting LFS working in luac.cross was fun. Sorry to take so long, but at least this gives me the opportunity to hammer the code on the PC. I've also moved the _init LFS wrapper into a ROTable, so that LFS.func1(), etc., works out of the box. I haven't hardwired the require loader into loader path so you'll still need to package.loaders[3] = LFS._loader in your init code for requires to resolve from LFS.

Incidentally, the new LFS image format is intrinsically a lot more compact that the old format. So the old image file for lua_examples/lfs/*.lua was 12,484 bytes and adding gzip compression brings this down down to 5,744 bytes. The equivalent 5.3 image is 5,152 bytes and (a 2Kb dictionary) compression only drops this by another 20%, so (i) the new format uncompressed is comparable to or slightly better than the old one compressed, and (ii) extra compression only gives marginal improvements.

TerryE commented 5 years ago

you'll still need to package.loaders[3] = LFS._loader

On reflection its just easier to do

package.loaders[3] = function(m) return LFS[m] or "Module not in LFS" end

TerryE commented 5 years ago

Time for another update. So now you can load LFS images into luac.cross and access this code from a -e script such as LFS[LFS._list[1]]() which will execute the first module in the image, hence

for f in $(ls *.lua|grep -v all); do echo -ne  "\n$f "; luac.cross -f -o /tmp/${f/.lua/.img} $f; done
for f in /tmp/*.img; do luac.cross -F $f -e /tmp/test.lua; done

converts the test suite into LFS images as then executes each in turn from LFS.

Except that I've got about 30% of them segfaulting if running from LFS ~~or throwing lua_asserts~~, so a few more bugs to track down, but the fact that 70% survive a hammering from some heavy test scripts does mean that I am getting close.

PS. first one found: the integer dump algo checks for abs(i) <= max_int which barfs on 0x80000000. Had that one before!

PPS. Now tracked down the various test fail foibles -- all basically known porting incompatibilities. Just got this one common mode failure on loading some large images.

PPPS. All segfaults down to a single root cause: I used the wrong TString length macro getshrlen() instead of tsslen(s) on one line which meant that I was undersizing a buffer.

TerryE commented 5 years ago

Gosh, this one is like wading through treacle. I've got Lua5.3 booting and it seems to be running stably. But getting the test scripts working is hard going. At the moment I don't have the -a option working and nodemcu-partition.py doesn't yet support the new LFS image format. So I bootstrap the SPIFFS by using nodemcu tool to download an image containing the ftpserver and spin it up with LFS.ftpserver().open(...) then drag and drop any files I need onto the ESP. So here is an example of one of the test suite executions (I've trimmed out the garbage):

--- Miniterm on /dev/ttyUSB0  115200,8,N,1 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
>
> =node.heap()
44576
> node.flashreload'nextvar.img'
> 
 ets Jan  8 2013,rst cause:2, boot mode:(3,6)
load 0x40100000, len 31860, room 16 
...
 ets Jan  8 2013,rst cause:2, boot mode:(3,6)
load 0x40100000, len 31860, room 16 
...
LFS image loaded

NodeMCU 3.0.0.0  ...
 build 2019-10-24 15:34 powered by Lua 5.3.5 on SDK 3.0.1-dev(fce080e)
cannot open init.lua: 
> LFS.nextvar()
testing tables, next, and for

 >>> testC not active: skipping tests for table sizes <<<

+
+
E:M 20496
E:M 20496
not enough memory
> =node.heap()
43392

You can see that the tests are reallocing 20K chunks and the ESP allocator is simple. If you use realloc to grow a resource this is implemented by "alloc new; copy; free old" and the chances of finding 20Kb chunks in 44Kb RAM are impractical, so I need to scale back the sizing.

These test suites are also pretty big; some need a 256Kb LFS to load.

But at the moment Lua5.3 is faster than the old Lua; we have 44Kb heap at boot and the memory goes further thanks to the 8-byte TValue.

PS. Just got that test working. The issue was the loop

for i=0,10000 do
  if math.fmod(i,10) ~= 0 then
    a['x'..i] = i
  end
end

which attempts to create a 10K long array and 10K string keys, which would need about 300Kb RAM!! Change the 10000 to 100 and the entire 639 line test works.

TerryE commented 5 years ago

One issue with doing a lot of testing is that I am reloading LFS image a lot, so I have just added a few more tweaks to optimise this. I'll push them on my next commit:

Doing a flash page erase is really slow, so it turns out that first checking to see if the page is already erased is worthwhile; this way we don't waste time erasing LFS pages that are already erased:
```
if (*f != ~0 || memcmp(f, f + 1, FLASH_PAGE_SIZE - sizeof(*f))) {
  lu_int32 s = platform_flash_get_sector_of_address(F->addrPhys + i);
  platform_flash_erase_sector(s);    
}
```
I've pinched a trick from luac.c so that instead of restarting the CPU after loading the new LFS, I just restart the Lua environment. So the node.flashreload() now takes about 1 sec.

nwf commented 4 years ago

Forgive a naive question, but I think this is the right place to ask and I didn't see anything of the sort mentioned above. Could we have an on-chip flashmerge() that took a LFS image from SPIFFS and merged it into the existing LFS, clobbering entries already there? (Similarly, a merge option to flashload, perhaps.)

This would let us ship .bin files with initial LFSes installed and then have users add their own contents to LFS rather than having to work out what we put in LFS and preserve it when making their own LFS images with luac.cross. (The most straightfoward alternative, I suppose, is to ship the modules we put into the .bin LFS also into SPIFFS as lfs-foo.lc or such, so that flashload can grab them.)

TerryE commented 4 years ago

Forgive a naive question ... LFS ... merge ...

Not the right issue, but merging LFS images is quite an architectural change. If would be quite difficult to achieve with the Lua 5.1 LFS image format. One of the reasons for my changing the internal LFS format for Lua 5.3 was to facilitate this type of operation. Even so, its quite a lot of work, so let's keep this to its own issue.

PS: getting my issues mixed. Let's keep this here for now. Sorry.

TerryE commented 4 years ago

I've not got the structure and most of the implementation hurdles addressed by design, so I propose to implement this one soon.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TerryE commented 3 years ago

Now that I am quite a bit healthier, I want to revisit this whole area.

nwf commented 3 years ago

! Glad to hear from you! Please do let me know if and how I can be of assistance.