Open TerryE opened 4 years ago
A Cache_Read_Disable()/Cache_Read_Enable()
pair should do the necessary register twiddling to reload the cache window into the SPI flash. IIRC it's a single control register in DPORT0, but I don't seem to have a handy definition document sitting around anywhere I can find it right now. Maybe I lost that in the disk crash the other year? Anyway, just make absolutely sure that the call to Cache_Read_Enable()
is either present in the instruction cache (i.e. not in what would be the next cacheline) or that you put the call pair explicitly into IRAM :)
Wow, that is one writeup of all your prepartion of this and the descussions about it.
I stumbled across this example:
Example usecases
-- Create a (child) LFS based on all LC files in SPIFFS do local f,a = files.list('%.lc$'),{} for k in pairs(f) do a[#a+1] = k end node.LFS.reload(a) end
Does the node.LFS.reload(a)
really work incrementally and how do I then reset it?
Or is it connected to content of the file multifunction vs. single function?
A
Cache_Read_Disable()
/Cache_Read_Enable()
pair should do the necessary register twiddling to reload the cache window into the SPI flash.
@jmattsson, thanks. Yup , the SDK SPI routines use Cache_Read_Enable_2()
and Cache_Read_Disable_2()
and what these do is to is to temporarily turn off the cache without flushing it, and this is clearly gives a lot faster performance, but the kicker is that you can loose cache coherence if you are doing an SPI write to an mapped address range. I also understand that the disable/enable pair need to be executed from IRAM0 and that the enable must have the args (0, 0 , 1)
for our configuration, and that is having a 32Kb ICACHE mapping 1 Mb starting at 0x000000. DiUS Ltd will need to tweak this with their OTA fork.
Thanks for the heads up. For this particular purpose you might want to deliberately use the ROM'd Cache_Read_Disable()/...Enable()
rather than the "improved" SDK versions.
Oh, and I found the reference to the control register: https://github.com/esp8266/esp8266-wiki/wiki/Memory-Map It's got both the flash cache and the iram cache info there, if you prefer to hit it directly.
Does the
node.LFS.reload(a)
really work incrementally and how do I then reset it? Or is it connected to content of the file multifunction vs. single function?
@HHHartmann. Future tense: it will work this way, post the PR. That's because there will only be one dump format. A luac.cross
image file is just an LC file with all of the modules in it.
This implementation is continually trying to achieve as much scaling as possible whilst working within the ~44Kb heap limits available on a clean restart.
The dump process is pretty lightweight in that it is walking Proto hierarchies in RAM and LFS and serially dumping them to file. The kicker is that I need to collect all of the strings used in the dump and then append them to the dump file.
E:M
error.How much of a realistic limitation is having a max of 512 string constants per dump? I doubt that many developers would hit this. and the workaround is to split the dump into multiple files.
A slightly more complex alternative (a bit geeky if you weren't into this sort of thing) would be to allocate a custom fixed, say 2K × 4-byte, hash vector with a quadratic hash and a packed 12:2:18 structure (ndx, source, iTString). You are storing index, {int ndx; TString *}
pair but could pack down into a single word given that the max index is 4K, say and the TString is a word aligned offset from DRAM0, LFS0 or LFS1 but the `{int, TString}` version is just easier to code. This would execute as fast and give a 2K string constant limit, but would need a few dozen lines of extra code to implement.
BTW guys, (and girls if @sonaux is tracking this -- why is IoT such a guy thing? :cry:) I have spend quite a few months brooding about implementation strategies to improve the functionality vs simplicity of implementation trade-off. This approach is about as good as we can get IMO.
One small implementation issue that I need to address is that in standard Lua the dump and undump processes are strictly serial. And hence Lua uses a lua_Writer
abstraction to manage output (with an exact parallel lua_Reader
)
typedef int (*lua_Writer) (lua_State *L, const void* p, size_t sz, void* ud);
The type of the writer function used by
lua_dump
. Every time it produces another piece of chunk,lua_dump
calls the writer, passing along the buffer to be written (p
), its size (sz
), and the data parameter supplied tolua_dump
.
The issue is that the dump produces <fixed header><one or more protos><dump of strings>
but the undump
wants to process <fixed header><dump of strings><one or more protos>
which is trivial to implement so long as we can fseek along the input or output stream. To this end I am extending the interface:
sz
parameter is changed to type int
with the negative values -1 and -2 having a special meaning and p
points to an int
buffer.
sz == -1
set *p
to the current byte position in the streamsz == -2
set the current byte position in the stream to *p
Hence the write process is \
The read process is \
This does mean that the dump and load can only use true files and cannot process streams like stdin
and stdout
, but I don't view this as a functional limitation.
Anyone see any issues with this?
When it comes to ESP development cycles, we have to discuss the "elephant in the room":
The limited amount of RAM heap (perhaps 44Kb on an ESP8266) limits the size of application that can be developed using a purely ESP-based development cycle.
Moving compilation and building of LFS off-ESP and onto the host environment significantly increases the scale of applications that can be be developed, but this brings with it all of the issues that especially seem to trouble Windows-based developers of building and executing the cross-compiler toolset.
I want to focus on this first point and how we understand and mitigate the various scaling issues that constrain this life cycle.
ROstrt
) uses a chained rather than an open addressing scheme, with a 2^N sized vector of TString *
for the hash. Entry collisions can occur resulting in multiple TString chains (using the next
field) and in which case the string resolution algorithm needs to search down the chain for a match. The average performance is roughly as the following table.
TValue
fields. These contain a resolved pointer to the actual TString, so do not need resolution. Constant TStrings are only resolved once during loading, so runtime resolution is primarily needed when creating a new string at runtime. This is a relatively infrequent activity.ROstrt
could perhaps contain 4K elements (16Kb RAM).% Occupancy | # Comparisons for hit | # Comparisons for miss |
---|---|---|
50% | 1.25 | 1.53 |
75% | 1.37 | 1.75 |
100% | 1.50 | 2.01 |
150% | 1.75 | 2.47 |
200% | 2.00 | 2.98 |
For LFS-based applications the RAM strt
is used largely for runtime build strings. These tend to be highly dynamic and heavily GCed, and so might typically be only be 10% of the size of the ROstrt
.
The (LFS) load process has to copy a set of Protos each with associated instruction vector, constants and metadata plus a set of strings into LFS. Any loaded string constants need to be resolved against TStrings at already known addresses in LFS. Each load file contains a set of Protos and a set of strings; the strings must be resolved against the existing ones in the ROstrt
(and the sysROstrt
for an appLFS) and resolution misses copied to LFS and added to the working hash in RAM. The ICACHE flash must then be flushed before processing the Protos, since subsequent resolution within the Proto structures will need to reference this TString content by addressing through the normal mapped address space. This whole load sequence is largely a file to LFS copy, with buffers and a few key tables like the working ROstrt
hash being needed in RAM.
The (RAM) load process shares the same code, excepting that the low level store and access primitives work with normal RAM based resources and the RAM strt
.
The dump process is somewhat more complicated than that used in standard Lua. This is because strings are uncollated in the standard Lua format but must be collated on a per file basis to make undump-to-LFS workable. Also in the case of RAM-to-file dumping, both the compiled code structures and any collation management structures must be maintained in RAM during the entire process.
luac.cross
, RAM availability isn't an issue, so we can use standard Lua arrays and other structure and leave their management to the standard table handling. At 20 bytes per table entry, it is unlikely that we would be able to undump any function set with more than 512 string constants in it.Tstring *
+ N × short
so a 1.6K constant hash would require 12Kb RAM and we might be able to process up to 4K constant dumps. But only up to say 80% occupancy and no resizing. The hash table doesn't need to be a 2^N size either. All straightforward to implement apart from the issue of how to choose the hash table size. There is no harm in oversizing it so long as there is enough RAM available after doing any compiles / loads needed. I can think of four possible strategies for sizing this:
Whilst these approaches will need some extra coding, the fact that the file format will be the same for both normal LC files and LFS image components also allows some compensating simplification.
As discussed in the previous post, we need to add an fseek operation to avoid a 2 pass process. This being said, when I will have a look at the implementation details, a two pass process might just end up being simpler to code.
At the moment I also support two host LFS formats: one is a shadow ESP format which allows the luac.cross
compiler to build an absolute LFS image for host-based provisioning, and the second is a flash emulation albeit with host-native size_t
and pointers to allow full host-based testing of LFS functionality. I will probably need to keep these.
Whilst the broad usage is as in the previous post, RAM availability and fragmentation is going to be an issue and as a community I feel that we will need to develop build processes. I feel that for everything apart from small setups, on-ESP LFS reimaging will involve a number of steps carried out immediately after restart accessing some Lua based "build script" and using an RTC memory counter for stepping through the build. Maybe we regard the first iteration and Alpha and subject to change, until we get a handle on how usable it is and what the practical scaling limits are.
I'd consider a two-pass approach entirely reasonable (and possible preferred, depending on internals). Either of the hash size options seems acceptable at this stage. We can always revisit later if needed.
FWIW, I think "take as many passes as you need" is a perfectly reasonable approach, especially given the very limited heap space. I could even imagine this being something like flashreload("lfs-%s*.lc")
turning into:
lc
files are well-formed (pass 1), and erase the designated LFS partition.lc
files building up what will be ROstrt
in RAM (pass 2) while ignoring all the code, then commit that ROstrt to the LFS partition. Free the in-RAM ROstrt
buffer (and maybe reboot the module again, if it makes your life easier).lc
files again (pass 3), this time computing the Proto hierarchy using the now-in-flash ROstrt
, committing to LFS as possible to free up RAM.@nwf Thanks for this useful feedback. Let me play a little ping-pong on your points.
flashreload
variant and stick to the node.LFS.reload()
form.reload()
should only take an array or string argument. I had originally thought in terms of a indexed array only, but I can see that adding any key strings is a pragmatic extension. Hence this would work:
node.LFS.reload(files.list("lfs-%w*.lc"))
ROstrt
. Any files with an invalid CRC or other invalid fields in the individual files would be detected during this scan. In principle, we could add an optional ROstrt_size
parameter to allow the developer to specify this. However, this is the only pass which does not change the LFS region, so using this as a pre-update check is still valuable.TString
records direct to the LFS region, at the same time updating the RAM copy of the ROstrt
index. Note that if we have a 4K entry index with 75% occupancy and an average size of 16 + 12 bytes say, then the RAM index is some 16Kb long, but the TStrings written to LFS would total 84Kb or ⅓ of the current maximum LFS region size and double available heap size. These are written on a per file basis with the ICACHE flushed after each file.fseek
to position to the Strings then rewind to the start to process the Protos, then we can do 2A + 2B on a single file open.reload()
implementation writes the filename as an RCR record and restarts to ensure a clean environment with minimal heap fragmentation. I now feel that this is not really needed complication, as we need to use on RCR records to pass this context: it is simpler just to restart the Lua VM and RTS, and this allows a malloc'ed structure to be used to pass the file list to the loader. We would still need the final reboot, so this was reload would only involve a single restart.Maybe a dumb question, but if we're doing multiple passes anyway, do we still need to have the string table before the protos? As in, do we need to deviate from the default dump format? The fewer deviations we have from standard Lua, the easier to upgrade in the future (as you undoubtedly know from first-hand experience by now). If I've misunderstood one of the finer points here, do feel free to just point me to a whitepaper or something you've already written :D
Maybe a dumb question, but if we're doing multiple passes anyway, do we still need to have the string table before the protos?
In a word Yes. That is in terms of processing order. The in-file order can be resolved by the odd fseek.
The current standard Lua undump process is designed to go from a (serial) data stream to randomly addressable memory. The idea of having to use a serial API to program flash memory was never considered as a non-functional requirement (NFR) during file format design. We want to serialise the undump to LFS as much as possible and this creates two design objectives:
The limited RAM issue also means that we can only afford to cache limited resources in RAM, so repeating my previous example, we can afford to keep the ROstrt
vector of TString
pointers in RAM, but not the TStrings themselves. They mush have been preloaded into LFS and the cache flushed so that they can be directly address for resolution during load.
There is also an issue of density. Our current format is about 50% the length of the equivalent standard Lua compiled file formats. This has non-trivial benefits in saving network transfer times and file system utilisation.
Adding our NFRs and relaying out the file format actually makes the dump and undump processes simpler, and significantly less RAM intensive. The corollary here when we are RAM-constrained is that on-ESP life-cycle applications can be larger.
Maybe a dumb question [...]
In a word Yes.
Gotcha :D Carry on! 👍
@jmattsson, this one might amuse you.
This might be counter-intuitive, but making the serialised (LC) dump format compatible with both writing to LFS and to RAM actually removes a shed load of now redundant code -- for example there is no longer a -f
option in luac.cross
as the normal LC format produced by -o
is the LFS image format. There is only one lua_dump()
function and one lua_load()
function, etc. Fun, fun, fun!
Nice! Always so satisfying to be able to remove code!
Unlike the standard Lua version of undump, this NodeMCU version supports an LFS mode, and so the undump function supports storing Protos hierarchies into one of two targets:
RAM heap space. This mode is a single pass with the Proto structures and TStrings created directly in RAM. All GCObjects are collectable and comply with Lua GC assumptions, so the GC will collect dangling resources in the case of a thrown error.
Flash programmable ROM memory. This is written serially using the flash write API. This mode supports LFS. This is a two pass load with the first pass being a read-only format validation and CRC check. The second pass is hooked in during startup and error are unlikely given pass 1. Any error will abort the pass, leaving a corrupted LFS which is detected and erased on next boot. Any reload will need to be manually retried.
The undump code for both modes is largely shared, excepting the top level orchestration and the bottom level write to RAM/FLASH a supplied CB which differs for the 2 modes. Mode 2 requires that writing of separate resource elements cannot be interleaved, so Proto record processing has been reordered to group resource writes and cache the Proto itself in RAM and to walk the Proto's dependents bottom-up. Other than this Mode 1 is largely as in standard Lua, so doesn't really need further discussion.
On the other hand, Mode 2 supports multiple compiled code files each with multiple Top Level Functions (TLFs) and is able to write to the LFS region in Flash. The LFS load process has two passes:
Pass 1 is executed within the standard execution environment in the callframe below the node.LFS.reload()
invocation.
Pass 2. The LuaN_init
startup code detects the pass 2 header and enters the pass 2 loader after starting the Lua RTS. (Note that this mode bypasses the app/modules
startup hooks so no module initialisation is carried out.)
ROstrt
is allocated in RAM. This just the N × TString *
index. The TStrings themselves will be written directly to Flash.TString *
lookup index is allocated and these are resolved against the (temporary ) ROstrt
(as these might duplicate strings already loaded in previous files, or later in sysLFS); the lookup index is updated and any new TStrings written to LFS. This time as the Proto hierarchies are parsed in the load file, they are written directly to LFS. On completion of each file load, the Instruction cache is flushed. ROstrt
is written to LFS.ROTable
index of {name = Proto}
is written to LFS.Hence the LFS reload takes 2 restarts. The second is actually optional since we could just restart the Lua environment without a CPU restart. However, the heap will have been fragmented during pass 2 so the restart is prudent.
A power-fail during pass 2 will be detected and result in a fallback startup with a blank LFS. In this case the reload will need to be manually repeated. Given that SPIFFS suffers from worse issues (it doesn't even detect power-fail), doing anything more is over-engineering. IMO.
I've got the multi-TLF dump and undump working fine. Most of the tuning is around making sure that we don't end up with RAM constraints unnecessarily limiting the size of LFS that can be manipulated and loaded on-ESP. As an example, you broadly have three strategies for sizing the ROstrt:
node.LFS.reload()
call otherwise it will throw an E:M
error.E:M
error whilst loading larger LFS images.BTW the issue isn't so much RAM during the pass 2 which occurs post restart in a clean startup config, but in pass 1 which is called directly from node.LFS.reload()
and which validates the load and can return control to the calling Lua app in the case of an error.
In the end given my discussion in my post a couple of weeks ago above which is about sizing the ROstrt, and which observes that the probe performance scales in a very well behaved manner as a function of TString count to ROstrt size, that (1) is the simplest; least RAM use and most robust option. At the moment I have a lot of inline whitepaper style documentation about these decisions in the code, (unlike the rest of the Lua code base which has zero inline documentation and expects the core developer to retro-engineer this from the code itself -- "it does what it is"). I am think about pulling all of this commentary of out the source and moving it into a whitepaper, which we can always make available for anyone interested. This is what I will do unless anyone shouts. :smile:
Honestly, I'd prefer more comments in code. I ascribe to the Donald Knuth "it's a book for humans that happens to have some code in it" approach (and literate programming was robbed of its fair comparison, but that's a separate rant; https://buttondown.email/hillelwayne/archive/donald-knuth-was-framed/ does a good job). If you also want to make a whitepaper, that's fine. :)
I'm definitely partial to having solid commentary in the source and have been known to leave big blocks of texts explaining the intent and reasoning. The idea being that once you've read the prose you'll be in a good position to follow the code.
Sure, pure code can be elegant and easily comprehended as well, and doesn't risk the comments going stale, but for any non-trivial source I'm in favour of some in-file documentation. So, if you've already got it there, I wouldn't toss it out without good reason :) That said, I certainly also enjoy reading your in-depth whitepapers!
This change touches quite a lot of components so coding and testing everything is taking a bit of time. Still it looks like you will be able to build maximal LFS images on ESP, which is a lot better than I anticipated. The biggest constraint is the maximum compile size of a single source module, though I will extend my compilation service to support this.
One thing that I am leaving out of this PR is the two LFS support. I don't see this as high risk but I want to do some performance benchmarking on one ROstrt vs. two. In this second case, the appLFS ROstrt will embed the sysLFS lookup as well. If I confirm the benchmarking issues then I will cover this in a separate post.
Just a quick update to let those interested that I am not 'on strike', but steadily plugging away at this. The code is basically all there, but testing and debugging the usecases is proving rather complicated. The same dump and undump code has ended up being being a reimplementation, and much a is shared between three save modes: in RAM; in normal LFS; in Absolute LFS. This last one is a real quirk thanks to Johny et al: In the LFSA mode, the pointers are 32-bit and refer to on-ESP addresses, even though the undump code will (typically) be running on a 64-bit PC OS, so size_t
is 8 bytes, not 4. Oh yes, to work withing the ESP memory constraints and to avoid caching too much in RAM for on-ESP loads, the dump operations are 3-pass and LFS variant undumps are 2-pass so in some cases whole areas of code need to be dummied out.
The last wrinkle is that all of this also needs to be Lua Test Suite compliant, so TString
records get wrapped in a UTString
type which is packed out to 8×size_t
, and this mustn't break the code. I am now pretty much all of the way there -- at least in the luac,cross -e
execution environment and can load, restart and execute LFS images (yes, the same ones as load on the ESP). I've got some other test cases to cover, then I'll move onto ESP testing.
So slow progress, but all still OK.
$ cat /tmp/d.lua
debug.debug()
$ /luac.cross -e /tmp/d.lua
LFS image corrupted.
Erasing LFS from flash addr 0x090000 to 0x0cffff
lua_debug> lfsreload{"/tmp/lfs.lc", "/tmp/ftp.lc"}
Erasing LFS from flash addr 0x090000 to 0x0cffff
LFS image loaded
lua_debug> print(lfsindex'telnet',lfsindex'ftpserver')
function: 0x558d56ac2e80 function: 0x558d56acd390
lua_debug> telnet=lfsindex'telnet'()
lua_debug> for k,v in pairs(telnet) do print (k,v) end
open function: 0x558d56accd90
close function: 0x558d56acd060
lua_debug>
$
The host build doesn't have file
and net
so these modules won't run successfully on a luac.cross -e
but at least these can be loaded into LFS and then referenced in code. Still have a number of other main paths to shake down: LFS Absolute mode, on-ESP saving and loading, updates to luac.cross
parameters, etc., but the core of the dump / undump code seems to be working well. So still more work to do and the odd gremlins to shake out, but I am now confident that I have addressed the bulk of the technical challenges with getting on-ESP modes working.
Note that the host environment doesn't have node
, and hence no node.LFS
so my quick host only fix is to add these functions to the baselib.
Question for @jmattsson, @nwf, @HHHartmann, @marcelstoer, etc. Should I do the small fixes for #3193 and merge this into dev first, or just add this big tranche of functionality into this PR. TBH, if someone else was doing this, then my instinct would be to do this as two PRs: complete #3193 and then raise this as a second PR, so maybe I am answering my own Q.
If it's not too much extra work, I think my preference would be to do it as two PRs. A bit easier to review too.
TBH, I started working on this one whilst I was waiting for #3193 and sort of got sucked into it. Time to stash this and draw a line under the pending PR, so let me do this over the w/e.
Great work, thank you!
For the approach with multiple reboots, and connected hardware that will be confused by device reboot, how long will it usually take from first reboot until I can again run code to restore the connected hardware to a safe state? i.e. is it worth considering "restore sanity" phases after/between the reboots?
As for the 1st pass limited RAM, I think this is a problem users can easily solve, by rebooting their application into a minimal-but-safe mode and using that one for the LFS update.
On comments in code vs. whitepapers, maybe we can have the whitepapers include the relevant parts from the source. I'd be willing to help with tooling for the rendering.
how long will it usually take from first reboot until I can again run code to restore the connected hardware to a safe state?
Of the order of 1 sec.
is it worth considering "restore sanity" phases after/between the reboots?
The LFS load is two pass. The 1st pass is a validation and sizing pass, with nothing written to the LFS. Errors are returned as a return string. We can have a high degree of confidence that the second live pass will work.
I am happy to work with you if you want to actively contribute to project. My email address is in the git logs.
Of the order of 1 sec.
My gut feeling says that this long a downtime might be long enough to require extra safety measures on some kinds of devices connected, but I think we can postpone these aspects until someone describes an actual case where a second of wrong I/O state can cause damage. Even then, it's not a degradation compared to the old feature set, just a case of "it could be even better".
IMHO, if that kind of I/O safety matters, you should be using an I/O expander or other additional peripherals you completely control (possibly just to gate onboard peripherals; e.g., route the UARTs through AND gates). The ESP firmware has its own mind about GPIO things and it is not, AFAIK, promised to be stable across releases.
Yeah. It's a balance between what states the external device can survive for how long, and how much safety equipment you want to afford plugging in between, because that extra equipment will take up extra space, potentially also electricity and probably extra effort in code.
Let's put this into perspective: if you are using something like an Arduino, then an update to applications code will take seconds. As Nathaniel says if you want truly bumpless control then you are looking at redundancy and safety features in your H/W design. You won't get this out of the box using an IoT module costing a couple of $
I have just pushed the first tranche in #3272. the ldump/lundump C files are pretty much a complete rewrite and lnodemcu C file has major rework. Why? The previous LFS code and formats were different to the standard dump formats, so t was feasible to attempt a minimum change as far as the dump and undump code was concerned. In general I would describe the core Lua design / coding strategy as: minimal, simple and orthogonal. In general there is almost no attempt at peephole optimisation within the source, but rather the coding style relies on the C compiler optimiser to do this. The fact that the Lua runtime's size and performance exceeds that of PHP7 underlines the effectiveness of this strategy in my mind.
However, in the case of NodeMCU, we have some extra functional and non-functional drivers:
This totality really means that the dump / undump implementation has to be new rather than an incremental, though it does mirror the best concepts of the original code. Whilst I have embraced the minimal, simple and orthogonal principles, in one respect I do differ from the Ierusalimschy camp is that I don't believe in zero comment style. I have include heavy inline commenting to explain the impacts of these constraints on the implementation.
By way of an example that I picked up in testing is that I have to pass a bunch of string parameters from Pass 1 to Pass 2 of the undump and these will eventually become TStrings in the LFS so for hand-over the TString headers are all 0xFF
followed by the CStrings; that way the CStrings can be used and the TString headers updated once the ROstrt size is known. Some of these strings can be copied from the old LFS string table, but this get erased before the strings can be written so any LFS CStrings need dupped into RAM before the LFS can be arranged. I also intially planned to force a rapid reboot by calling system_restart()
follwed b throwing an error, but this would die horribly if the Lua code that called the node.LFS.restart
did so with a pcall
and was running out of the old LFS. Lots of testing run.
AS you should notice this version doesn't include the dual LFS nor the host absolue LFS variants, but these requirements have been reflected in the architecture and they are modest incremental additions.
Sounds good to me. :-)
Once we have two LFS regions, will it be possible to write one of them incrementally from a download handler without caching the image in SPIFFS first?
Good Q, but no.
@nwf @HHHartmann @jmattsson and anyone else that wants to comment:
Just to flag a mild compatibility break with standard Lua (and that never worked properly on NodeMCU anyway). Standard luac
allows "-
" as an input or output filename, which defaults to stdin
/ stdout
respectively on POSIX builds, hence you can pipe in and out of luac
. Because our load / unload code is common to both host and ESP and is multi-pass to be able to work on the ESP within RAM constraints, this makes allowing stdin
/ stdout
as a valid input or out file a total PITA since you can't rewind pipes.
So for the avoidance of doubt I am going to say that we only support file-based source and compile output for the cross compiler and treat the filename "-" as invalid.
Anyone you wants to add this back in can add this extra tier of complexity and is free to develop the patch to spool to / from a temporary file after this PR is merged.
Also note that there are other incompatibilities: standard luac
always outputs a single compiled top level function. It does this by creating a dummy wrapper function if you specify multiple input sources; this compiles but isn't usable in practice as there is no execution path to access or to bind the individual Protos as closures. We support an output format which allows multiple top level functions (TLFs), so we just output a multi-TLF file.
Also note that there is nothing to stop you "compiling" one or more existing LC files. In our case, these are just aggregated into a compiled LC file (which can be used as an LFS image).
In terms of the options:
-l
, -o
,-p
,-v
, --
. These options are as per standard luac
, except that the default output file is out.lc
-f
,-F
. The LFS formats are now unfied with normal LC file formats, so there are no "special" LFS flash formats. -s
. This set the default stripping option 0 = no stripping, 1 = strip local and upvale metadata but retain line numbers. 2 = strip all debug metadata. Omitting -s
is equivalent to option 0; -s
without a level digit is equivalent to option 2. We recommend -s 0
for LFS modules.-a addr
. Instead of an LC file, luac.cross
outputs an LFS image based at the specified address. The -o
must also specify the image name. Note that the option -m
is no longer supported. This is because an LFS image can be created from multiple LC files. If you want to validate the size of a specific configuration, then create a (temporary) absolute LFS image using the -a
option; the size of this file is the size of the corresponding LFS image.
Also note that as well as Lua source, LC files are accepted as input, so
luac.cross -p -l -l myprog.lc
will give a full code listing of myprog.lc
luac.cross -o johny.img -a 0x4029C000 mylib.lc mysub1.lua
will give generate an LFS image that can be flashed to an LFS region at ESP address 0x4029C000, flash address 0x9C000. It is up to the developer to ensure that this address matches the corresponding LFS region in the partition table. This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@stale I want this.
@Stale not helpful.
At least the two regions part would be nice
New feature
Additional support for two LFS regions, and the ability to both save and load (update) LFS regions on ESP.
Justification
Most committers and many developers would seem to want this.
Highlights
This enhancement will be for Lua 5.3 only as this builds upon the groundwork that I've already laid in the Lua53 implementation.
lua_dump()
can now take an array argument as well as a function argument at ToS. If an array is specified then the dump stream will contain all functions in the array.node.dumpfile()
function essentially does the same asstring.dump()
but writing the dump stream direct to file without needing to assemble and store the compiled content in RAM. (This scales much better.) Both of these functions also support the "array of functions" argument type.lua_load()
(and derived functions) now support the multi-function format, and in which case return a keyed array instead of a single function. The array is of the form{"name" = nameFunc, ...}
loadfile()
andnode.LFS.reload()
can take an array of filenames as well as a single file, in which case the set of files will be loaded into RAM or LFS. In the case of loadfile this will return the{"name" = nameFunc, ...}
array if multiple files or multi-function files are loaded. In the case of the LFS reload, this array is not returned but is used as LFS region's index ROTable after restart.node.LFS.reload()
can only process compiled files and will error on any source files.The appLFS will inherit any strings already in sysLFS(Not true. I do need twoROstrt
and hence these TStrings will not be duplicated in appLFS. ~~TheROstrt
for the appLFS also includes all (short) TStrings in sysLFS and hence the sysLFSROstrt
isn't used. The G(L) global state points to this new appLFSROstrt
. From a string lookup perspective there is no difference between a single and a double LFS configuration.ROstrt
indexes because overflow chaining depends on the table size and I can't redo these chains in the sysLFS when loading the appLFS.)__index
entry pointing to the sysLFS index ROTable. Hence resolution across the two ROtables is "free" using standard Lua RTS array access.node.LFS.list
property has already been updated to a function which takes an optional argument: 'parent' lists only those functions in the parent LFS and likewise for 'child'; 'system' and 'application' are synonyms for these options. Omitting the argument list all functions in both LFSs if configured as a parent / child pair.Example usecases
How the NodeMCU binary format differs from standard Lua 5.3
In general terms the Lua RTS dump function determinately traverses a Proto hierarchy converting all fields to a stream of binary tokens and this stream is the compiled file format. The load executes an "undump" which does the inverse traverse recreating the Proto hierarchies. This much is the same. But as to why the differences:
File size and RAM usage is a lot more important to us so for example:
1
takes a single byte.I have also reordered the dump walk so that compiled code on reloading can be written sequentially to the LFS region using
spi_flash_write()
operations.When loading one or more Proto hierarchies in a file into LFS we need to add any TString constants that are not already in LFS. I do this by maintaining in RAM a copy of what will become the
ROstrt
for the LFS. This is a lookup that allows fast resolution against TString clashes, but in case of a clash I still need to compare the new TString against the copy in LFS to differentiate between a true match and a hash duplication. This uses ICACHE resolved access, and so I need to flush the cache to ensure cache coherence. I would rather do this once per file rather than once per proto.Hence the dump process collects the array of TStrings used in the dump and appends this as a string vector at the end of the file. Any inline TString references use an index into this vector.
The NodeMCU file format includes a fixed header which includes a file CRC and the offset of this TString vector. The undumper fseeks to the TString vector and processes this first before fseeking back to the start of the file to process the Protos. This avoids the need for two passes during dump.
The CRC-32 is at a fixed offset from the start of the file and can be used as an image ID, and
node.LFS.verify(image)
will check that the dump format is the current version and return this checksum. (Special request from @HHHartmann.) It can optionally checksum the image. The checksum is probably worth doing before reloading LFSTechnical Issues
Cache coherence.
I currently do a botch to flush the ICACHE, and that is to read a sequential 32Kb address window in flash. @jmattsson: Q: do you know a better way?