Closed HHHartmann closed 4 years ago
More testing revealed that uploading of nul (0x00) bytes does not work properly. All other char codes seem to work.
Gregor, thanks for this analysis. I'll take a deep dive into this tomorrow and get back to you. I think that this is an SDK 3.0 issue or at least the unification I've done so that the libraries can access the Lua 5.1 and 5.3 VMs with a common API. This big bitch with the pre SDK 3.0 runtime was that it was just far too easy to overrun the input UART, so with SDK 3.0, I've introduced the stdin and stdout pipes for buffering. This seems to have broken something.
I think that this is related to #2964.
OK, this is is subtle and nasty. @HHHartmann @jmattsson, I would appreciate your thoughts here.
pRxBuff
for all UART input, Lua or application. The main PITA here is that the Interpreter is pretty slow at compiling source so if the interpreter is called synchronously, then this processing can't keep to with baudrate-limited pasted Lua code, so code fragments much longer than 256 bytes would overrun the buffer; drop text and therefore result in compile errors.uart:on('data',fn)
is handled much as before -- that is the data is passed directly from the input task to the application CB.What we get as a result of this approach is edge artifacts, as the input stream is toggled between the Lua and application modes without sync pauses between the modes -- as is the case with ESPlorer. Interleaving piped input and direct input doesn't work well.
So using this example ESPlorer first uploads a 380byte / 2 line function _up(n ,bs, ls)
which will split a file into n
chunks of bs
bytes, except the last potentially short chunk of ls
bytes and append it to the output file chunk-by-chunk. If a 600 byte file was being uploaded then it might then send:
_up(3,250,100)\n<chunk1><chunk2><chunk3>
and the handover fails. Need to think this one through.
This is a cut down version of nodemcu-uploader. Basically, this uses the same approach.
import serial
from time import sleep
RCV = r"""
function recv_block(d)
if string.byte(d, 1) == 1 then
size = string.byte(d, 2)
uart.write(0,'\006')
if size > 0 then
file.write(string.sub(d, 3, 3+size-1))
else
file.close()
uart.on('data')
end
else
uart.write(0, '\021' .. d)
uart.on('data')
end
end
function recv_name(d) d = d:gsub('%z.*', '') file.remove(d) file.open(d, 'w') uart.on('data', 130, recv_block, 0) uart.write(0, '\006') end
function recv() uart.on('data', '\000', recv_name, 0) uart.write(0, 'C') end
"""
LIST = r"""
for i,j in pairs(file.list()) do print(i) end
"""
CLEAR = r"""
for i,j in pairs(file.list()) do file.remove(i) end
"""
TIMEOUT = 0.3
with serial.Serial("/dev/ttyUSB0", 115200) as s:
s.write(CLEAR.encode()) # remove files
sleep(TIMEOUT)
s.write(RCV.encode()) # upload code
sleep(TIMEOUT)
s.write(b'recv()\n') # enter uploading mode
sleep(TIMEOUT)
s.write(b'file.lua' + b'\000') # filename
sleep(TIMEOUT)
s.write(
b'\001' + # ack
b'\004' + # size
b'test' + # data
b'\000' * 123 # padding
)
sleep(TIMEOUT)
s.write(
b'\001' + # ack
b'\000' + # size
b'\000' * 128 # padding
)
sleep(TIMEOUT)
s.read_all() # flush buffer
s.write(LIST.encode()) # list files
sleep(TIMEOUT)
print(s.read_all())
Looking at the _up() code snipped I posted above it first sends a >
which probably is checked for by ESPlorer. So I assume it is
esp: >
ESPlorer: 1st 250 byte chunk
esp: ~CRC-START~45326~CRC-END~
ESPlorer: 2st 250 byte chunk
...
...
NOTE: the given crc number is only a sample.
My tests also showed that it is rather related to sending a null character. I tested with 256 byte long files containing all characters and then started to bisect the file. In the end it filed when there was a null character in the file and succeeded otherwise.
It also might be possible that we have different versions of ESPlorer, as mine uses
_up(n,l,ll)
and yours seems to use
_up(n ,bs, ls)
Gregor, you need to apply the patch that I discussed on the other issue https://github.com/nodemcu/nodemcu-firmware/issues/2964#issuecomment-554679666 to fix the \0 issue, and ESPlorer emits as you say _up(n,l,ll)
; it's just that p2 is the block size and p3 the last block size.
Picking up @seregaxvm's example, there are critical points where you need to sleep at least enough to let the UART so that the SDK tasking can be scheduled, but there is little point in going overboard.
Even so, if we can achieve terminal buffering but still implement on('data', ...) in a way that doesn't require an up issue of the tools, would be good. I'll get back to you tomorrow.
I'd like to understand more about this null byte issue first and foremost.
Other than that, it's probably not a terrible idea to have a high water level for the input pipe, which if reached posts a high priority run-the-vm message (on top of the earlier low-priority messages).
Johny, see the above link for the fix, re null byte.
Dicking around with priorities dynamically only has very limited benefit in the non-preemptive SDK scheduler. The current config works really well for interactive use and cut & paste. IMO, we should just pass all input through the stdin
pipe.
The current mix of piped interactive feed and synchronous data feed doesn't work with legacy uploaders such as the one built in to Esplorer.
I'll work up something tomorrow and post it.
Ah, sorry Terry, my mailer hadn't shown me your response. Good pickup on that uninitialised variable issue.
I was under the impression that with your major 5.1/5.3 work all input was going through the stdin
pipe. If it isn't, then I definitely agree that it should.
@HHHartmann @seregaxvm, as an aside the downloaded stubs for both the Python nodemcu-uploader
and the Java ESPlorer
implementation could really do with reworking; e.g. in this second case the code does a file open, append, close per block; this strategy badly fragments the SPIFFS pages compared to just leaving the file open and writing to it serially, in which case the FS buffers the data stream and writes out complete pages. Updating the Python is straightforward, so long as Peter or one of his committers agrees to commit the PR, but doing the Java build is a PITA.
Always feeding the UART input through the stdin pipe is reasonably straight forward, but we do have to deal with the two input modes: text for Lua command input (this must correctly process \
This is weird. After fresh build from dev (and application of patch https://github.com/nodemcu/nodemcu-firmware/issues/2964) file uploading broke yet again. Now hardware watchdog reset fires off during upload.
When on commit f3b49849d2ab38b8e, after application of patch #2964:
diff --git a/app/modules/uart.c b/app/modules/uart.c
index 2b1f9164..116a323f 100644
--- a/app/modules/uart.c
+++ b/app/modules/uart.c
@@ -31,6 +31,7 @@ static int l_uart_on( lua_State* L )
if (lua_type( L, stack ) == LUA_TNUMBER)
{
data_len = luaL_checkinteger( L, stack );
+ end_char = (int16_t) ~0;
luaL_argcheck(L, data_len >= 0 && data_len < LUA_MAXINPUT, stack, "wrong a
rg range");
stack++;
}
@@ -57,7 +58,10 @@ static int l_uart_on( lua_State* L )
luaL_unref(L, LUA_REGISTRYINDEX, uart_receive_rf);
uart_receive_rf = LUA_NOREF;
}
- input_setup_receive(uart_on_data_cb, data_len, end_char, run_input);
+ if (uart_receive_rf == LUA_NOREF)
+ input_setup_receive(NULL, 0, 0, 1);
+ else
+ input_setup_receive(uart_on_data_cb, data_len, end_char, run_input);
return 0;
}
Using simple script to test uart
module:
import serial
from time import sleep
A = r"""
function foo(x)
uart.write(0,"t")
end
uart.on("data", "c", foo, 0)
"""
s = serial.Serial("/dev/ttyUSB0", 115200)
s.write(A.encode())
sleep(0.1)
s.read_all()
s.write(b"12345abc")
sleep(0.1)
print(s.read_all())
s.close()
Callback foo
is expected to run when character c
is encountered in the input stream. So, character t
should be returned only once. However, this script returns b'tttttttt'
, which means that foo
was called for every character in stream.
btw, why do we cast values to int16_t
before assigning char end_char
in https://github.com/nodemcu/nodemcu-firmware/blob/dev/app/modules/uart.c#L26-L41?
This is also strange.
lua_pushvalue(L, stack); // <- top of stack
luaL_unref(L, LUA_REGISTRYINDEX, uart_receive_rf);
uart_receive_rf = luaL_ref(L, LUA_REGISTRYINDEX); // reference function from top of stack
We are making reference to value 3
via luaL_ref
for some reason. We actually need to pop 4th value from stack before making reference to function, not push another.
btw, why do we cast values to
int16_t
before assigningchar end_char
in https://github.com/nodemcu/nodemcu-firmware/blob/dev/app/modules/uart.c#L26-L41?
Historic reasons. Valid terminators are 0..255, but (int16_t) ~1 is used on fixed length reads so that no false match occurs.
The boss is visiting family tomorrow for a few days, so I will have time to sort this. The temporal logic of this needs carefully thought through.
Except that I did my back in by not warming up enough before yoga -- hazards of old age :rofl: -- so have been rather distracted by a lot of pain.
Sergey, the use of the Registry for persistent Lua collectable references is a pretty standard coding pattern. You have to reference Lua collectables somewhere if you want then to persist across task invocations, and that they need a reference chain back to the environment _G
or the Lua registry otherwise the Lua CG will collect them. This is old code BTW and I am not sure why the ToS was dupped as the Func TValue on the ToS isn't used again, a lua_settop()
could have been used instead. It doesn't do any harm as the ldo.c
wrappers balance unrolling the stack as part of the function call epilogue.
However, your inference is correct: correctly balancing the Lua stack across all code paths -- or the failure to do so -- is one of the major PITAs in Lua C API development and also an aspect where the elf gdb utility is extremely useful.
One slight complication here is that the implementation has to be common across both Lua 5.1 and Lua 5.3 as the rest of the ecosystem is shared.
The functional requirement here is subtle. @jmattsson Johny and I discussed previously the idea of unifying stdin pipe for both interaction and application (uart.on('data', ... )
) input, but on reflection, there are gotchas here. For example one use case that you might not want to break is where the Lua interactive thread is being fed from a telnet session or the like but the application itself is using a serial API over the UART to 'talk' to some intelligent peripheral. This is a valid use and one that we wouldn't want to break. Whatever we do should work for all of the use cases that we want to support, but it we also need to avoid backwards compatibility breaks for the common uploader tools that our community use.
As someone who does use the UART for talking to other devices rather than a human operator, there's still the run_input
argument to uart.on()
, though I'll admit it's often been easier to leave it as default and expose a set of Lua functions for the other-end-of-the-uart device to make use of. Well, when I have control over the other device, that is :)
If run_input
is zero, then we naturally can't throw the data into the stdin
pipe, but otherwise there's no conceptual argument against it, I think? Whether we should have an actual "data" pipe that's used for the run_input==0
case is a more interesting question I'd say (and the instinctual response from me is "yes, probably").
node.input()
uart.on('data,...)
CB)As it stands these requirements are mutually incompatible. For example (3) means that we need to pass the input stream through a buffered pipe, so we might end up with (input L1A)(incomplete compile L1A)(input L1B)(input L2)(compile L1A+L1B, execute)(input L3), etc. However executing chunk L1A+B might switch to data mode so L2 must be processed in binary mode, so for example any \
One option here might be to pass all input characters into the pipe, but there are use cases where the application would need to decouple network input from the uart pipe (e.g. using telnet to develop a Lua application which uses the UART for a serial API to a peripheral device.
I think that the way to square this circle is acknowledge that we have two broad use cases or modes for interfacing to the interpreter:
The tl;dr is that my SDK 3.0 introduction of input buffering broke this second mode for 2 out of the 3 common UART-based uploader tools. The obvious work around is to add a uart mode which makes the buffered input optional so that the default boot mode is compatible with the assumptions built into ESPlorer, etc.
Day 10 of being stuck flat in bed, but at least the pain has largely gone, so I can at least think straight. I have been brooding about how to approach this properly. We have 7 component files or "agents" in some sense involved in delivering this input / UART direction:
driver/uart.c
includes the UART ISR and any direct access to the UART H/W.driver/input.c
isn't really a driver per se, but contains the high priority task that the uart ISR posts to handle line editing and marshalling the input stream into records.modules/uart.c
implements the Lua library API to UART functions.modules/node.c
implements the Lua library API to core SDK/ESP functions.modules/pipe.c
implements the Lua library API to use and manipulate pipe objects.lua/lua.c
and lua53/lua.c
implement the overall Lua startup and interpreter framework for the respective Lua 5.1 and 5.3 run-times.This underlines that most of the functionality is common to both Lua versions, and to me at least, it underlines that we should seek to abstract from the Lua implementation all functional aspects that are common to both. Both lua.c
implementations are very different to the "out of the box" variants, but that is because they must work within an event-driven tasking environment, but again the non-functional requirements here are the same for both Lua 5.1 and Lua 5.3 so we can abstract these out of lua.c
.
I will strip this pipe and ESP-specific functionality out of the 2×lua.c
and I therefore need to move it somewhere. The driver/input.c
file seems the best home for this. Maybe this file doesn't belong in drivers
since it isn't really a driver but the only alternative directory for such platform-specific stuff is the platform
folder. Maybe it would a better fit there, though we can always do any directory tidying as a separate exercise (we have similar issues with task.c
).
Picking up my previous comment, I think that pipe input should be disabled by default so that we can reimplement backwards-compatibility for ESPlorer, etc. We can add a uart.buffer(boolean)
API call to allow developer to choose which they prefer.
@jmattsson, How well does ESPlorer handle uploads on the ESP32?
We can add a uart.buffer(boolean) API call to allow developer to choose which they prefer
Maybe control buffering via uart.on
method
argument? data
for unbuffered and code
for buffered.
@jmattsson, How well does ESPlorer handle uploads on the ESP32?
I haven't actually used that, but last time I saw someone try to use it, it crashed badly in the file
module. I think that was due to the stack corruption we discussed a few months ago, but I haven't been back to check on it. I'm split between two projects at work, and multiple sub-projects within each, and none of which are close to an ESP at the moment :(
I wish you continued good recovery, @TerryE .
@seregaxvm,
Maybe control buffering via
uart.on
method
argument?data
for unbuffered andcode
for buffered
If you trace through the ESPlorer java code etc., it assembles the Lua interactive _up()
function call followed by 10mSec delay then issues first upload binary block. It then issues each successive binary block on receipt of a valid CRC return. This is all very time critical since what the lack of handshake between the <Lua _up()>
record and the first data block means that the lua record must be processed so that the on.data()
can be executed before input.c
processes the next record, so adding an extra argument to on.data()
doesn't help.
As you mentioned above I can easily create a stripped down version of the python tool and even a variant which reengineers the ESPlorer algo, as I can rapid prototype with Python. And this is what I will have to do.
Gregor, you need to apply the patch that I discussed on the other issue #2964 (comment) to fix the \0 issue
@TerryE If I read the code correctly it fixes the \0issue but only to exchange it with a \255 issue. @input,c around Line 187
if( ins.line_pos >= ins.len ||
(ins.data_len > 0 && ins.line_pos >= ins.data_len) ||
ch == ins.end_char ) {
ins.uart_cb(ins.data, ins.line_pos);
ins.line_pos = 0;
}
the check ch == ins.end_char
is unconditional and so calls the cb for ch = 255
It should be changed to (ins.data_len == 0 && ch == ins.end_char )
Mea culpa. Since end_char was only used in one routine l_uart_on()
in the modules/uart.c
I moved into this scope and somehow changed it from unit_16
to char
. I need to do a review of these changes :unamused:
PS. Not quite correct, checking my notes. If the len is non-zero then the char is ignored. Still needs a cross module review.
PPS. The syntax uart.on(method, [number/end_char], [function], [run_input])
is flawed IMO, because this (and the module code) allows uart.on('data', func)
as valid syntax but the handling is undefined.
OK, looking at the input.c
logic, we have the following combinations:
len | delim | runflag | Action |
---|---|---|---|
0 | ign | 0 | Call CB with each input char |
n>0 | ign | 0 | Call CB with string of n chars |
-1 | chr | 0 | Call CB with string delimited by \ |
ign | ign | 1 | Call CB with string delimited by valid EOL char and then call Lua interpreter |
Note that in this last case if you specify the run flag set then the delimiters are validated but then ignored. In the third case above, in the expression
ins.line_pos >= ins.len ||
(ins.data_len > 0 && ins.line_pos >= ins.data_len) ||
ch == ins.end_char
this is equivalent to buffer overflow || fixed len reached || end char matched
. These last two should be exclusive so as you say the end char test should be guarded by a data_len test:
if( ins.line_pos >= ins.len ||
(ins.data_len >= 0 && ins.line_pos >= ins.data_len) ||
(ins.data_len < 0 && ch == ins.end_char )) {
Setting end_char
to ~0
if the len is specified is irrelevant. Sorry
Currently in app/modules/uart.c in the case of a character given the following code is executed. need_len = 0;
So to meet the test above this would need to be set to -1.
Why not just leave it at 0 and change the test to
if( ins.line_pos >= ins.len ||
(ins.data_len > 0 && ins.line_pos >= ins.data_len) ||
(ins.data_len <= 0 && ch == ins.end_char )) {
Ah thinking a bit longer leads me to using -1 for data_len which also would cover the uart.on('data', func)
case that you pointed out.
Alternatively we could also raise an error if none or both are specified.
Could we get this in dev as a separate PR?
Oh no, I looked at the master version. In dev it is need_len = -1
.
So I will prepare a PR tomorrow with the updated condition as you stated above.
Gregor, I am working through a load of tidy up in this area as we discuss this, and I've already made these changes in my working copy. It will really complicate things if two of us are doing PRs converring this.
Terry I see the problem, however this is sort of a showstopper on dev currently as the binary serial communication is broken. Maybe you could get a minimal subset on dev before landing your big PR? Or is it only a matter of days?
Gregor, if you have a quick fix, then go with it.
OK. after applying the discussed fix I can upload the LFS image using ESPlorer but then #2973 strikes and I still can't do anything.
@HHHartmann did you check image hashsum?
I've done the tweaks as discussed to input.c
and some other tidy up. ESPlorer
upload works, as does nodemcu-tool
. Unfortunately nodemcu-uploader
doesn't; not sure why, but I don't know my way around the code well enough to debug this. For testing I did a stripped-down uploader, test-upload.lua which works fine, as does LFS loading:
~/nodemcu-firmware$ luac.cross -f -o /tmp/ftpserver.img \
lua_examples/lfs/[d_]*.lua lua_modules/ftp/ftpserver.lua
~/nodemcu-firmware$ test-upload /tmp/ftpserver.img
/dev/ttyUSB0 connected to fd:3
............................................................................................................
12075 bytes written to ftpserver.img
~/nodemcu-firmware$ nodemcu-uploader terminal
--- Miniterm on /dev/ttyUSB0 115200,8,N,1 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
> node.startupcommand('node.flashindex("_init")()')
> node.flashreload'ftpserver.img'
...
NodeMCU 3.0.0.0
branch: dev-lua53-2
commit: 18282e2a2a7689f7e7c461eb19813d92ace92d42
release: 2.0.0-master_20170202 +456
release DTS: 201911152024
SSL: false
build type: float
LFS: 0x0
modules: adc,bit,file,gdbstub,gpio,net,node,tmr,uart,wifi
build 2019-12-07 18:27 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)
> -- Note how the _init module is now executed automatically
> =LFS
table: 0x3fff09a8
> =unpack(LFS._list)
dummy_strings _init ftpserver
> =LFS.ftpserver
function: 0x3ffefe28
BTW, this is on my working branch. I am not sure why LFS is stalling for you Gregor. Sorry.
The make LUA=53
version also works, but the _init
module isn't needed.
@TerryE I had applied the changes to dev, so that't why I hit that other bug.
Gregor, we need to merge in my 5.3 PR.
Terry I think that would be a good idea. It fixes several annoying bugs and will bring us forward 51/53 ways.
@HHHartmann, I used current dev for bin and LFS image build using Docker, still experiencing the mentioned LFS image upload problem using ESPlorer. ESPlorer info shows java.lang.NullPointerException
. The uploading process hangs after a few 100 bytes.
Any advice on how to proceed ?
@NicolSpies see Terry's comment about. His fix is not on dev
yet but in his 5.3 branch.
@marcelstoer ,thanks, again. In the meantime I have started using nodeMCU-tool via bash for windows.
Fixed in #3075
@TerryE , just confirmation feedback, the LFS upload using Esplorer is functional again and appears to be better than before. The #3075 fixing commit seems to speed-up the upload of all files. The LFS flashing is also faster. :1st_place_medal:
@NicolSpies Thanks for the feedback Nicol. The PR definitely some introduced some gremlin that is impacting some automoted upload algos. I am tracking this down. We could really do with an automated test coverage suite, not only for core Lua functions but also for core NodeMCU components such as node
, file
, wifi
and uart
Expected behavior
Uploading LFS image using ESPlorer should work.
Actual behavior
The device sometimes reboots and ESPlorer displays
Test code
This seems to be a problem for binary files only, as a 6K binary file cannot be uploaded but a 60K textfile can.
The code used by ESPlorer to receive the file looks like this:
It then sends
n
blocks with the length ofl
and one last block with lengthll
.NodeMCU version
NodeMCU 3.0.0.0 branch: dev commit: f3044e1aff432ca48f6511554b317e0c80207d1c release: 2.0.0-master_20170202 +451 release DTS: SSL: false build type: float LFS: 0x0 modules: adc,bit,dht,file,gpio,i2c,mqtt,net,node,ow,spi,tmr,uart,wifi build 2019-11-13 07:09 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)
Hardware
Generic Dev board ESP8266SXD