meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.3k stars 800 forks source link

Update espresive sdk to latest (to fix? nasty bug described below) #606

Closed mc-hamster closed 3 years ago

mc-hamster commented 3 years ago

The upload interface is being used more often. Maybe it should no longer be experimental.

It crashes in the web server( esp32 https server ) in dealing with the incoming stream from the browser.

mc-hamster commented 3 years ago

We may be able to glean inspiration for a solution from:

https://github.com/espressif/arduino-esp32/pull/3752/files/7af1d1cdb84914a248e190c4f58d543ab42b899e

mc-hamster commented 3 years ago

Crashing has been fixed.

The solution I proposed earlier was a problem, just not the problem to the root case of this issue.

This fix has uncovered a limitation in the SPIFFS file system, namely, it's not very good at being fragmented -- it slows to a crawl. Shouldn't be a problem if the filesystem is "fresh". I'll update the upload interface at some point to a json end point and have it return file system stats.

Noki commented 3 years ago

@mc-hamster I updated to 1.1.32 which overwrote the spiffs partition. I tried to reupload the 0.1.3 release of the web interface using /static and it always crashes when uploading the app.js.gz file. The small files are just fine. Looks like this bug is not fully fixed yet.

mc-hamster commented 3 years ago

@mc-hamster I updated to 1.1.32 which overwrote the spiffs partition. I tried to reupload the 0.1.3 release of the web interface using /static and it always crashes when uploading the app.js.gz file. The small files are just fine. Looks like this bug is not fully fixed yet.

Could you attach the log from the console When it crashes? What browser do you use? Os?

Noki commented 3 years ago

Browser: Chrome Version 87.0.4280.88

Here is the log:

[HTTPS:I] New connection. Socket FID=59
[HTTPS:I] Request: POST /upload (FID=59)
[HTTPS:I] New connection. Socket FID=60
E (1162342) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (1162342) task_wdt:  - loopTask (CPU 1)
E (1162342) task_wdt: Tasks currently running:
E (1162342) task_wdt: CPU 0: IDLE0
E (1162342) task_wdt: CPU 1: loopTask
E (1162342) task_wdt: Aborting.
abort() was called at PC 0x40113668 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x4008ff5f:0x3ffbeb00 0x400902e5:0x3ffbeb20 0x40113668:0x3ffbeb40 0x4008246d:0x3ffbeb60 0x400925cf:0x3ffbc700

Rebooting...
??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????16:04:41 Emitting reboot packet for serial shell
DEBUG:root:Received: {'rebooted': True}
Noki commented 3 years ago

Chrome on Windows crashes, Firefox on Windows works Chromium on Linux works, Firefox on Linux works

mc-hamster commented 3 years ago

Chrome on windows. Fascinating. I didn't try that.

Thanks for the report.

mc-hamster commented 3 years ago

Interesting thing is that didn't crash in the same place. What you've reported is unrelated but we can track it on the same bug.

I won't be able to look at this for a few days but if I'm unable to reproduce it, I may send you a custom build with more debugging information.

Noki commented 3 years ago

Sounds great. I just managed to upload it with Chrome on Windows by opening the Developer Tools. Really, really strange behaviour. Dev-Tools open => works, Dev-Tools closed => crashes. This will be an interesting one to track down. :-D

mc-hamster commented 3 years ago

I've got it running on Chrome on Windows. Can't reproduce this. What version of chrome do you have?

mc-hamster commented 3 years ago

@Noki Please try with this attached firmware. It's built for the tbeam on the US frequencies, so I wouldn't use it to send/receive messages in your country. Let me know if you need it for other boards.

If you can get it to crash on Chrome, please attach your log files here. There's not much going on before the other log entries, so like you said "This will be an interesting one to track down".

tobias-upload-debug-firmware.bin.zip

mc-hamster commented 3 years ago

After 10 minutes of uploading files continuously , I was able to randomly get a crash. It's happens at a different point in time from the one @Noki pointed out but the address on the backtrace is very similar. It suggests that there's an interrupt routine in RadioLibInterface.cpp that's causing the crash. Moving this into its own thread may abate the ISR since it won't be interrupting the main loop.

Backtrace: 0x400812b4:0x3ffbf0f0 0x400812dd:0x3ffbf120 0x40081359:0x3ffbf140 0x4008246d:0x3ffbf160 0x40062211:0x3ffd1200 0x40096879:0x3ffd1220 0x400968b2:0x3ffd1250 0x40096c09:0x3ffd1280 0x40086c15:0x3ffd12a0 0x4017c169:0x3ffd12c0 0x40184b29:0x3ffd12e0 0x401862fc:0x3ffd1310 0x401888e5:0x3ffd1340 0x401892de:0x3ffd1360 0x40186f7d:0x3ffd13a0 0x40184e06:0x3ffd1400 0x4018538d:0x3ffd1420 0x4018469d:0x3ffd1450 0x4018f986:0x3ffd1470 0x4000bd83:0x3ffd1490 0x4000117d:0x3ffd14b0 0x400592fe:0x3ffd14d0 0x4005937a:0x3ffd14f0 0x40058ad3:0x3ffd1510 0x401c3851:0x3ffd1540 0x401c389d:0x3ffd1580 0x401c29cd:0x3ffd15a0 0x401d863d:0x3ffd15c0 0x400e20a1:0x3ffd15e0 0x401d83c7:0x3ffd18b0 0x400df93e:0x3ffd18d0 0x400df96e:0x3ffd18f0 0x400f0d8d:0x3ffd1910 0x400df93e:0x3ffd1950 0x400f0b06:0x3ffd1970 0x400f0d8d:0x3ffd19d0 0x400df93e:0x3ffd1a10 0x400f1415:0x3ffd1a30 0x400ee472:0x3ffd1c00 0x400ded69:0x3ffd1c30 0x400da0c0:0x3ffd1c50 0x400fbe11:0x3ffd1c70

0 0x400812b4:0x3ffbf0f0 in RadioLibInterface::isrLevel0Common(RadioLibInterface::PendingISR) at src/mesh/RadioLibInterface.cpp:283

1 0x400812dd:0x3ffbf120 in RadioLibInterface::isrRxLevel0() at src/mesh/RadioLibInterface.cpp:283

2 0x40081359:0x3ffbf140 in __onPinInterrupt at /Users/jmcasler/.platformio/packages/framework-arduinoespressif32/cores/esp32/esp32-hal-gpio.c:274

3 0x4008246d:0x3ffbf160 in _xt_lowint1 at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154

4 0x40062211:0x3ffd1200 in ?? ??:0

5 0x40096879:0x3ffd1220 in esp_rom_spiflash_read_status at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/spi_flash_rom_patch.c:413

6 0x400968b2:0x3ffd1250 in esp_rom_spiflash_wait_idle at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/spi_flash_rom_patch.c:413

7 0x40096c09:0x3ffd1280 in esp_rom_spiflash_erase_sector_internal at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/spi_flash_rom_patch.c:121

  (inlined by) esp_rom_spiflash_erase_sector at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/spi_flash_rom_patch.c:451

8 0x40086c15:0x3ffd12a0 in spi_flash_erase_range at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/flash_ops.c:237

9 0x4017c169:0x3ffd12c0 in esp_partition_erase_range at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spi_flash/partition.c:294

10 0x40184b29:0x3ffd12e0 in spiffs_api_erase at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs_api.c:58

11 0x401862fc:0x3ffd1310 in spiffs_erase_block at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_nucleus.c:1164

12 0x401888e5:0x3ffd1340 in spiffs_gc_erase_block at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_gc.c:15

13 0x401892de:0x3ffd1360 in spiffs_gc_check at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_gc.c:168

14 0x40186f7d:0x3ffd13a0 in spiffs_object_append at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_nucleus.c:1219

15 0x40184e06:0x3ffd1400 in spiffs_hydro_write at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_hydrogen.c:1121

16 0x4018538d:0x3ffd1420 in SPIFFS_write at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/spiffs/src/spiffs_hydrogen.c:1121

17 0x4018469d:0x3ffd1450 in vfs_spiffs_write at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/spiffs/esp_spiffs.c:472

18 0x4018f986:0x3ffd1470 in esp_vfs_write at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/vfs/vfs.c:734

19 0x4000bd83:0x3ffd1490 in ?? ??:0

20 0x4000117d:0x3ffd14b0 in ?? ??:0

21 0x400592fe:0x3ffd14d0 in ?? ??:0

22 0x4005937a:0x3ffd14f0 in ?? ??:0

23 0x40058ad3:0x3ffd1510 in ?? ??:0

24 0x401c3851:0x3ffd1540 in _fwrite_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/fwrite.c:170

25 0x401c389d:0x3ffd1580 in fwrite at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/fwrite.c:211

26 0x401c29cd:0x3ffd15a0 in VFSFileImpl::write(unsigned char const*, unsigned int) at /Users/jmcasler/.platformio/packages/framework-arduinoespressif32/libraries/FS/src/vfs_api.h:31

27 0x401d863d:0x3ffd15c0 in fs::File::write(unsigned char const*, unsigned int) at /Users/jmcasler/.platformio/packages/framework-arduinoespressif32/libraries/FS/src/FS.cpp:258

28 0x400e20a1:0x3ffd15e0 in handleFormUpload(httpsserver::HTTPRequest, httpsserver::HTTPResponse) at src/meshwifi/meshhttp.cpp:813

29 0x401d83c7:0x3ffd18b0 in void std::_Bind<void ((httpsserver::HTTPRequest, httpsserver::HTTPResponse))(httpsserver::HTTPRequest, httpsserver::HTTPResponse*)>::__call<void, , 0u, 1u>(std::tuple<>&&, std::_Index_tuple<0u, 1u>) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1074

  (inlined by) void std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*)>::operator()<, void>() at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1133
  (inlined by) std::_Function_handler<void (), std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*)> >::_M_invoke(std::_Any_data const&) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1871

30 0x400df93e:0x3ffd18d0 in std::function<void ()>::operator()() const at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:925

31 0x400df96e:0x3ffd18f0 in middlewareSpeedUp160(httpsserver::HTTPRequest, httpsserver::HTTPResponse, std::function<void ()>) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:925

32 0x400f0d8d:0x3ffd1910 in void std::_Bind<void ((httpsserver::HTTPRequest, httpsserver::HTTPResponse, std::function<void ()>))(httpsserver::HTTPRequest, httpsserver::HTTPResponse*, std::function<void ()>)>::__call<void, , 0u, 1u, 2u>(std::tuple<>&&, std::_Index_tuple<0u, 1u, 2u>) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1074

  (inlined by) void std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>)>::operator()<, void>() at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1133
  (inlined by) std::_Function_handler<void (), std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>)> >::_M_invoke(std::_Any_data const&) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1871

33 0x400df93e:0x3ffd1950 in std::function<void ()>::operator()() const at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:925

34 0x400f0b06:0x3ffd1970 in httpsserver::validationMiddleware(httpsserver::HTTPRequest, httpsserver::HTTPResponse, std::function<void ()>) at .pio/libdeps/tbeam/esp32_https_server/src/HTTPConnection.cpp:641

35 0x400f0d8d:0x3ffd19d0 in void std::_Bind<void ((httpsserver::HTTPRequest, httpsserver::HTTPResponse, std::function<void ()>))(httpsserver::HTTPRequest, httpsserver::HTTPResponse*, std::function<void ()>)>::__call<void, , 0u, 1u, 2u>(std::tuple<>&&, std::_Index_tuple<0u, 1u, 2u>) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1074

  (inlined by) void std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>)>::operator()<, void>() at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1133
  (inlined by) std::_Function_handler<void (), std::_Bind<void (*(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>))(httpsserver::HTTPRequest*, httpsserver::HTTPResponse*, std::function<void ()>)> >::_M_invoke(std::_Any_data const&) at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/functional:1871

36 0x400df93e:0x3ffd1a10 in std::function<void ()>::operator()() const at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:925

37 0x400f1415:0x3ffd1a30 in httpsserver::HTTPConnection::loop() at .pio/libdeps/tbeam/esp32_https_server/src/HTTPConnection.cpp:510 (discriminator 4)

38 0x400ee472:0x3ffd1c00 in httpsserver::HTTPServer::loop() at .pio/libdeps/tbeam/esp32_https_server/src/HTTPServer.cpp:122

39 0x400ded69:0x3ffd1c30 in handleWebResponse() at /Users/jmcasler/.platformio/packages/toolchain-xtensa32/xtensa-esp32-elf/include/c++/5.2.0/bits/shared_ptr_base.h:925

40 0x400da0c0:0x3ffd1c50 in loop() at src/main.cpp:589

41 0x400fbe11:0x3ffd1c70 in loopTask(void*) at /Users/jmcasler/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:19

Noki commented 3 years ago

@mc-hamster Tried your firmware and got it to crash on the first attempt:

[HTTPS:I] Request: POST /upload (FID=58)
E (120711) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (120711) task_wdt:  - loopTask (CPU 1)
E (120711) task_wdt: Tasks currently running:
E (120711) task_wdt: CPU 0: IDLE0
E (120711) task_wdt: CPU 1: loopTask
E (120711) task_wdt: Aborting.
abort() was called at PC 0x40113750 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x4008ff5f:0x3ffbeb00 0x400902e5:0x3ffbeb20 0x40113750:0x3ffbeb40 0x4008246d:0x3ffbeb60 0x4010fd23:0x3ffbc6e0 0x400925b1:0x3ffbc700

Rebooting...
mc-hamster commented 3 years ago

@mc-hamster Tried your firmware and got it to crash on the first attempt:

[HTTPS:I] Request: POST /upload (FID=58)
E (120711) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (120711) task_wdt:  - loopTask (CPU 1)
E (120711) task_wdt: Tasks currently running:
E (120711) task_wdt: CPU 0: IDLE0
E (120711) task_wdt: CPU 1: loopTask
E (120711) task_wdt: Aborting.
abort() was called at PC 0x40113750 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x4008ff5f:0x3ffbeb00 0x400902e5:0x3ffbeb20 0x40113750:0x3ffbeb40 0x4008246d:0x3ffbeb60 0x4010fd23:0x3ffbc6e0 0x400925b1:0x3ffbc700

Rebooting...

This means it’s crashing inside the web server. What exact version of chrome are you using? Any extensions?

Noki commented 3 years ago

Chrome Version 87.0.4280.88 (Official Build) (64-Bit) without any plugins. I just created a new chrome profile to make sure it has nothing to do with anything installed in my normal Chrome install.

One thing I noticed: If I open the chrome developer tools and check "disable cache" the upload works. Disabling the cache makes Chrome use a new connection. Maybe it has something to do with the keep-alive handling of the http connection.

In addition the cache control request header changes from Cache-Control: max-age=0 to Cache-Control: no-cache but I don't now if the webserver does anything with that header.

mc-hamster commented 3 years ago

We're using the same version of chrome.

@Noki Please try this firmware. It has keep-alive disabled on /static and /upload

tobias-upload-debug-firmware2.bin.zip

Noki commented 3 years ago

@mc-hamster I could not get that version to work but I've seen that you also commited it as a change and compiled it by myself. It still crashes:

[HTTPS:I] Request: POST /upload (FID=61)
E (161606) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (161606) task_wdt:  - loopTask (CPU 1)
E (161606) task_wdt: Tasks currently running:
E (161606) task_wdt: CPU 0: IDLE0
E (161606) task_wdt: CPU 1: loopTask
E (161606) task_wdt: Aborting.
abort() was called at PC 0x401137d4 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x4008ff5f:0x3ffbeb00 0x400902e5:0x3ffbeb20 0x401137d4:0x3ffbeb40 0x4008246d:0x3ffbeb60 0x4000bfed:0x3ffbc660 0x40091f42:0x3ffbc670 0x40113b0e:0x3ffbc690 0x40113b1b:0x3ffbc6c0 0x4010fda1:0x3ffbc6e0 0x400925b1:0x3ffbc700

Rebooting...

I'm out of ideas...

mc-hamster commented 3 years ago

@Noki oh, you can make your own builds!

https://github.com/meshtastic/Meshtastic-device/blob/master/docs/software/build-instructions.md

I just added instructions on how to perform real-time backtraces at the bottom of the document. Run that while in platform.io and it’ll tell us exactly where it’s failing.

Noki commented 3 years ago
stack:
0x4008ff5f: invoke_abort at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
0x400902e5: abort at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
0x401137d4: task_wdt_isr at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:252
0x4008246d: _xt_lowint1 at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
0x40091f42: vTaskExitCritical at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
0x40113b0e: esp_task_wdt_reset at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:322
0x40113b1b: idle_hook_cb at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:80
0x4010fda1: esp_vApplicationIdleHook at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/esp32/freertos_hooks.c:108
0x400925b1: prvIdleTask at /home/kevinh/development/meshtastic/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
mc-hamster commented 3 years ago

Thanks for that stack trace. At this point I’m at a loss of what may be causing it and can’t reproduce it on my side.

That says the watchdog is being triggered but it’s all in the sdk. Hmmmm

Noki commented 3 years ago

I've seen that Expressif Master is ahead of Meshtastic with the following fix: Disable IRAM optimization for WiFi

causes IRAM overflow when both BT and WiFi are enabled

Can you merge the latest changes from Expressif Master? I will then try again and if it still crashes we can also file a bug with Expressif.

geeksville commented 3 years ago

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/want-to-help-alpha-test-this-build-android-app-1-1-35-fixes-for-ota-update/1862/135

geeksville commented 3 years ago

we (probably me alas) should probably just update to the latest expressiv that still works with arduino (so we can pull in this fix and others). I'll try to do it soon. I'm going to retitle this bug.

mc-hamster commented 3 years ago

closing this. the bug described no longer happens. we've added enough work arounds.