luc-github / ESP3D

FW for ESP8266/ESP8285/ESP32 used with 3D printer
GNU General Public License v3.0
1.76k stars 466 forks source link

ESP-01S(1M) Panic on client connection #358

Closed spock64 closed 5 years ago

spock64 commented 5 years ago

Trying to get a hardwired ESP-01S to work properly ...

To Reproduce Steps to reproduce the behavior:

  1. Clone, checkout and build per instructions (cpu=160MHz, SPIFFS=1M/144Kb
  2. Start board & wait for AP to start
  3. Connect Mac
  4. Note panic on Arduino debug terminal - details below

Expected behavior Expect the start up web UI to load.

ESP3D Firmware:

Target Firmware:

Board used (please complete the following information):

Additional context Here's the Panic

Panic /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WiFi/src/include/DataSource.h:93 const uint8_t* BufferedStreamDataSo: Assertion 'cb == stream_rem' failed.

stack>>> ctx: cont sp: 3ffffbe0 end: 3fffffc0 offset: 01b0

[Decoded] 0x4020d22a: BufferedStreamDataSource ::get_buffer(unsigned int) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WiFi/src/include/DataSource.h line 95 0x4020d44d: ClientContext::_write_some() at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WiFi/src/include/ClientContext.h line 486 0x4022e1ae: operator new(unsigned int) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/new_op.cc line 52 0x4020d50e: ClientContext::_write_from_source(DataSource) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WiFi/src/include/ClientContext.h line 445 0x4020d9e4: WiFiClient::write(Stream&) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WiFi/src/include/ClientContext.h line 369 0x402093ed: ESP8266WebServer::streamFile (fs::File&, String const&) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/ESP8266WebServer.h line 136 0x4020950b: handle_web_interface_root() at /var/folders/5s/yxp9nbtn4zv0vygn0bj1fnhr0000gn/T/arduino_build_627856/sketch/syncwebserver.cpp line 126 0x4021e4cc: _umm_free(void) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304 0x4021ae28: String::equals(String const&) const at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/cores/esp8266/WString.cpp line 498 0x40214a00: FunctionRequestHandler::handle(ESP8266WebServer&, HTTPMethod, String) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/detail/RequestHandlersImpl.h line 37 0x40223aca: std::_Function_handler ::_M_invoke(std::_Any_data const&) at /Users/j/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-3-20ed2b9/xtensa-lx106-elf/include/c++/4.8.2/functional line 2073 0x40213cb4: FunctionRequestHandler::canHandle(HTTPMethod, String) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/detail/RequestHandlersImpl.h line 20 0x40100105: std::function ::operator()() const at /Users/j/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/2.5.0-3-20ed2b9/xtensa-lx106-elf/include/c++/4.8.2/functional line 2465 0x40214a36: FunctionRequestHandler::handle(ESP8266WebServer&, HTTPMethod, String) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/detail/RequestHandlersImpl.h line 44 0x4021a8d0: String::String(String const&) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/cores/esp8266/WString.cpp line 41 0x40214ac1: ESP8266WebServer::_handleRequest() at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/ESP8266WebServer.cpp line 599 0x40214d14: ESP8266WebServer::handleClient() at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/libraries/ESP8266WebServer/src/ESP8266WebServer.cpp line 308 0x4021c253: delay(unsigned long) at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/cores/esp8266/core_esp8266_wiring.cpp line 54 0x4020455c: CONFIG::wait(unsigned int) at /var/folders/5s/yxp9nbtn4zv0vygn0bj1fnhr0000gn/T/arduino_build_627856/sketch/config.cpp line 78 0x40206bbb: Esp3D::process() at /var/folders/5s/yxp9nbtn4zv0vygn0bj1fnhr0000gn/T/arduino_build_627856/sketch/esp3d.cpp line 261 0x40206c48: loop() at /Users/j/Documents/Projects/esp3d/190815/ESP3D/esp3d/esp3d.ino line 36 0x4021bb0c: loop_wrapper() at /Users/j/Library/Arduino15/packages/esp8266/hardware/esp8266/2.5.2/cores/esp8266/core_esp8266_main.cpp line 125

<<<stack<<<

luc-github commented 5 years ago

Do you have puya chip ? Can you test with puya patch enabled ?

spock64 commented 5 years ago

The chips are xtx pn25f08b ... I can't see anything wrong with these?

Im using 2.5.2 of the board support, and Arduino version 1.8.9 - nothing odd there either? This version has the dynamic patch for PUYA chips - which of course my chips are not.

BTW, the firmware works fine in a Wemos D1 mini... Do you have any ideas?

luc-github commented 5 years ago

No. Esp8266 core 2.5.4 has no dynamic puya patch, need to enable it with a define. Only git has patch enabled by default. Because ESP3D works on others esp board only puya issue come to me

luc-github commented 5 years ago

Also format all flash when flashing FW to be sure SPIFFS is clean

spock64 commented 5 years ago

So, I made a little progress ... I used Esptool to erase then re-flashed with CORE debug on - and the SPIFFS size at 1M(144kb SPIFFS). So, I connect to the AP, and drag the index.html.gz I then see "SPIFFS_write rc=-10001" in the debug but the captive portal page says that the write worked properly ... although the file is 124,511 byes in size, not all of it has been written (around 115k).

image

I rebuilt with SPIFFS set to 1M (160kb) and now things are working.

So I guess there is a bug somewhere when the UI thinks that the file will fit, but for some reason the file will not fit into the SPIFFS space available ... Perhaps there are some overheads that are not being taken into account?

luc-github commented 5 years ago

Latest webUI size is 121 KB (124,597 bytes), I forgot it was now so big... using 144K the available size is 125.25K as displayed on your screenshot which is very close to the limit so for some reason the complete file cannot be copied and spiffs is corrupted I think. So using 160K was the good solution

spock64 commented 5 years ago

IRRC, the file upload UI checks the size of the file to be uploaded? If so, we could adjust the check to avoid this problem - it is likely to be more straightforward than trying to get the file write errors propagated back to the browser.

Can we leave this bug open for a day or two? I'll look at a fix, and propose a PR if I can get it done quickly enough.

luc-github commented 5 years ago

there is no check done before upload, I was thinking of it in the past but need to handle overight and situation like you have would not be covered as free space should be 125.25K and file to be uploaded 121 K so it should fit in theory

sure, if you can find a root cause and a fix it would be great

spock64 commented 5 years ago

Root cause identified. PR should be incoming over the weekend.

luc-github commented 5 years ago

What is the root cause ?

spock64 commented 5 years ago

So "root cause" of upload of the UI file silently failing due to lack of space is that syncwebserver.cpp / asyncwebserver.cpp have no checks on the writes to the SPIFFS file. Adding checks means that when the file system is full (or some other problem), we can deal with the error by aborting the upload and closing/removing the SPIFFS file. We can then signal in the user interface that the upload failed.

I'm going to look at adding a check in the UI so that the file won't be uploaded if there's not enough space - but this is probably quite tricky. What can be done is checking in the device as the file size is passed in on the upload http request - that way we can fail the upload early.

I'll drop a PR in for this, and then work out what is going on with SPIFFS itself.

Is that OK?

luc-github commented 5 years ago

What if it is an overwrite of existing file? What if upload is a set of files and may be some are also overwrite ?

spock64 commented 5 years ago

Well, if it's an overwrite, the current code simply deletes the old file before starting to write the new file - which makes sense to me. If the file write fails, the user will need to sort out what is wrong - and this matters most on the initial load of the index.htlm.gz If it's a set of files, we can only really fail when the first write fails - this is only a little embedded device after all!

Quick question - I can see that there is an "index.html.gz" file included in the /data folder that presumably is loaded when the initial firmware is loaded. How do I build this file? I can see that you are using gulp etc, but am not familiar with it. My plan would be to amend this initial interface to avoid the failure that I saw, deal with the error, and include that in the PR.

luc-github commented 5 years ago

An overwrite does not have necessary the same size. You wrote you want to check if space is enough before upload, I just wanted to highlight some user cases that are not so simple . The ESP3D-WebUI has its own repository. The embedded page is in ESP3D repository. Both allows SPIFFS multi upload.

spock64 commented 5 years ago

Agreed - the overwrite does not have the same size - but the code right now removes the old file before creating the new one ... and, if the write fails (silently), the file is truncated and so will not decompress properly resulting in the problem that I identified. The use case that I would like to fix is the "When I load the initial index.html.gz the system does not work properly because the upload silently fails" ... I'm not looking at rewriting the whole file upload feature! Meanwhile can you explain how the index.html.gz located at esp3d/data/ is built? This is the first part of the UI to fix I think!

luc-github commented 5 years ago

it is explained here : https://github.com/luc-github/ESP3D-WEBUI/wiki/Compilation---Development

well in your case space was enough if I refer to your screenshot: https://user-images.githubusercontent.com/22578643/63114282-69087780-bf8c-11e9-8f7f-783030fbd969.png, so checking space won't catch the error.

if SPIFFS writing failed during upload SPIFFS, should raise an issue, but seems not the case or it is not well catched, anyway there is some workaround that I have introduced in 3.0 but UI code is ready already on 2.1, it checks the file original size vs the one on SPIFFSafter upload is done, https://github.com/luc-github/ESP3D/blob/3.0/esp3d/src/modules/http/handles/upload-files.cpp#L97-L103, if they are not the same upload raise an error.

spock64 commented 5 years ago

Luc, I just sent a pull request to fix this issue for the 2.1 branch. The changes in the PR check the result of the write to SPIFFS, cleaning up on failure and raising an error. Also includes a change to the embedded html (in embedded) to alert that the write has failed.

I think similar changes will be needed for the 3.0 branch. I have not looked at 3.0, but can review if you like!

BTW, I did no rebuild "no file.h" for the PR - this should be part of building the system I think?

luc-github commented 5 years ago

Ok I will look at it - thank you

reading at code the PR raise error if error in transfer or if size does not match for each packet, so if any packet is lost the final size won't be correct even if each packet size are ok - I will add this check present in also. I you can check 3.0 but I will do also as I want both sync for bug fix

no, currently the embedded page need to be built manually using build.bat script

So you fixed the "silent" error - not the error it self -I am correct ? Or did you also found why even you have enough space, the file saving failed ?

spock64 commented 5 years ago

Yes, the PR checks that the write of each packet is handled successfully - if not, the part-written file is deleted and error is raised. The javascript in the embedded UI running in the browser sees the failure via the newly added "xmlhttp.onerror" handler and uses a modal alert to inform the user that the write has failed. The user can then try again etc. When the "full fat" html interface is running, instead of the embedded version, the error is already handled in the existing javascript.

BTW, I'm. using a Mac, so did not use build.bat so had to use a hacked script / bin2c implementation for test purposes.

It makes sense to look at the 3.0 sync and asynch html handlers to ensure that writes are properly completed. In the async case, there might be a problem as asynchronous writes should be used in this case. I'll take a look and raise a new bug on the 3.0 branch if I see problems. Is that OK?

BTW, it is worth looking at the SD-writing code too to ensure that write failures are handled properly.

So, in summary, in the PR I fixed the "silent" error where an incomplete index.html.gz file was written. This is really the root cause of the bug I raised.

Now, the reason why the file does not fit into the SPIFFS-reported space available is all to do with file system overheads. As these overheads are not predictable, we cannot be sure that "125kb available" will accommodate a file of that size or even nearly that size. I'll take a look at SPIFFS when I get some time (not this week) and see if there is any reasonable implementation that would work. Hope this makes sense?

luc-github commented 5 years ago

async is no more maintained and not recommended currently - so I am not sure it make sense to add this fix

SD writing is handled by printer - error are managed differently

in 3.0 I am working on issue management using wsocket to raise error on UI, because if upload is stopped the connection is stopped so there is no meaninful reason - just error 0 beacause client closed connection, using websocket should workaround this pushing the error even web connection is closed.

about not predictable - I would say yes for already in use FW but not for freshly formated FW

anyway I have a thread also open to decrease webui size as it reach the limit https://github.com/luc-github/ESP3D-WEBUI/issues/47

luc-github commented 5 years ago

I have rewriting the upload error management on my private repo and will push the code tomorrow or this week end, this will avoid the meaningless Error: 0 message when connection is stopped

spock64 commented 5 years ago

OK - cool - I'll try it out when I see the commit.

luc-github commented 5 years ago

here it is https://github.com/luc-github/ESP3D/commit/f48ecfde47866f6332bced5921c7fb79e022f989 need also latest webui https://github.com/luc-github/ESP3D-WEBUI/commit/489f3cf24b52e3ec2aaa82fa9e1cdb7d6ca77f4a to catch the wsocket message Still need to implement the global size check present in 3.0, will also update 3.0 accordingly

luc-github commented 5 years ago

@spock64 I have implemented a more reactive version in 3.0, only the embedded is currently implemented in 3.0 - but you really can see the difference when uploading the SPIFFS and size is over / cannot write due to fragmentation space

luc-github commented 5 years ago

now size check is implemented in 2.1 with also instant message and upload abort

github-actions[bot] commented 4 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.