meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.3k stars 800 forks source link

remote nodes with admin channels reboot sometimes #811

Closed geeksville closed 3 years ago

geeksville commented 3 years ago

Coming back to this again after quite a bit of testing, I have now managed to identify a somewhat more reproducible way of causing resets on T-beams v1.0 over the air.

A single node running 1.2.30 with one encrypted channel can sit without resets permanently (I tested 23+h without a single reset)

With four nodes on the mesh, all flashed with 1.2.30, I see a reset every ~24 min or so (based on about 25 resets in 10h). These occur even if no messages are actively sent. On the two nodes that have a screen I can see that they often reset almost simultaneously. I’ve tested this with the nodes nodes connected via bluetooth and not, with and without admin channels enabled. The most reproducible way has been to have a remote node with an admin channel enabled. A reset is triggered in the remote node when meshtastic —debug is sent to a USB connected node. (one might have to wait a couple of minutes if the remote device is sleeping). I am unable to reproducibly induce this behaviour if the remote node does not have an admin channel enabled.

I therefore thought the resets occur only if the admin channel is enabled on the remote node, but after reflashing and only activating the primary channel in all nodes, two nodes nevertheless still reset after a few minutes (even without meshtastic —debug) just as I was writing this (both without USB connection, externally powered nodes don’t seem to reset).

It seems whatever is sent periodically in the mesh without user interaction might also be sent when running meshtastic —debug, and that this can cause the resets.

Can anybody else reproduce this? As mentioned previously, I cannot seem reproduce this if the nodes are powered.

...

I can see that, too. Whether it’s a LoRa32 or a T-Beam, powered by battery or via USB, they would eventually reboot. They all have the admin channel configured.

Plus, some nodes (of course it’s always the remote ones!) would eventually freeze, requiring a manual reboot. They would go on one, two or three days, then simply freeze.

from: https://meshtastic.discourse.group/t/new-device-release-1-2-30-ready-for-alpha-testing/3272/20

geeksville commented 3 years ago

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/new-release-of-python-api-1-2-35-and-geeksvilles-current-queue/3398/4

IZ1IVA commented 3 years ago

@geeksville running a T-Beam without battery, powered from USB, admin channel configured. Here's what happens before a spontaneous reboot:

09:46:34 2015 [PowerFSM] GPS prepare sleep! 09:47:04 2045 [PowerFSM] GPS prepare sleep! 09:47:04 2045 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0 09:47:04 2045 [PowerFSM] GPS prepare sleep! 09:47:34 2075 [PowerFSM] GPS prepare sleep! 09:47:34 2075 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0 09:47:34 2075 [RadioIf] (bw=125, sf=12, cr=4/8) packet symLen=32 ms, payloadSize=42, time 3645 ms 09:47:34 2075 [RadioIf] Lora RX (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5) 09:47:34 2075 [RadioIf] AirTime - Packet received : 3645ms 09:47:34 2076 [Router] Adding packet record (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5) 09:47:34 2076 [Router] Using channel 0 (hash 0xb1) 09:47:34 2076 [Router] Expanding short PSK #1 09:47:34 2076 [Router] Installing AES128 key! 09:47:34 2076 [Router] Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled. Core 1 register dump: PC : 0x400014e8 PS : 0x00060b30 A0 : 0x8011f720 A1 : 0x3ffd1560
A2 : 0x0000001a A3 : 0x00000018 A4 : 0x000000ff A5 : 0x0000ff00
A6 : 0x00ff0000 A7 : 0xff000000 A8 : 0x00000000 A9 : 0x00000008
A10 : 0x3ffd51b8 A11 : 0x3ffd175c A12 : 0x3ffd51c0 A13 : 0x3ffb28c4
A14 : 0x0000002a A15 : 0x3ffd19f0 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000018 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffd

ELF file SHA256: 0000000000000000

Backtrace: 0x400014e8:0x3ffd1560 0x4011f71d:0x3ffd1570 0x401284ee:0x3ffd1880 0x4012852a:0x3ffd1910 0x400d45cd:0x3ffd1950 0x400d4757:0x3ffd1990 0x400e547e:0x3ffd19e0 0x400def71:0x3ffd1a10 0x400df11a:0x3ffd1a30 0x400df1fa:0x3ffd1a50 0x400df219:0x3ffd1a70 0x400db1b9:0x3ffd1aa0 0x400d4e62:0x3ffd1ac0 0x400f1e21:0x3ffd1ae0 0x400da5d4:0x3ffd1b10 0x401022bd:0x3ffd1b30

Rebooting...

geeksville commented 3 years ago

@iz1iva Oh that is super useful. What is the exact build version for that trace? So I can run the stack dump tool.

(Sent from a phone - please ignore typos)

On Tue, May 25, 2021, 18:10 IZ1IVA @.***> wrote:

@geeksville https://github.com/geeksville running a T-Beam without battery, powered from USB, admin channel configured. Here's what happens before a spontaneous reboot:

09:46:34 2015 [PowerFSM] GPS prepare sleep! 09:47:04 2045 [PowerFSM] GPS prepare sleep! 09:47:04 2045 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0 09:47:04 2045 [PowerFSM] GPS prepare sleep! 09:47:34 2075 [PowerFSM] GPS prepare sleep! 09:47:34 2075 [Power] Battery: usbPower=1, isCharging=0, batMv=0, batPct=0 09:47:34 2075 [RadioIf] (bw=125, sf=12, cr=4/8) packet symLen=32 ms, payloadSize=42, time 3645 ms 09:47:34 2075 [RadioIf] Lora RX (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5) 09:47:34 2075 [RadioIf] AirTime - Packet received : 3645ms 09:47:34 2076 [Router] Adding packet record (id=0x6bbfdeed Fr0x34 To0xd8, WantAck0, HopLim5 Ch0xb1 encrypted rxSNR=10.5) 09:47:34 2076 [Router] Using channel 0 (hash 0xb1) 09:47:34 2076 [Router] Expanding short PSK #1 https://github.com/meshtastic/Meshtastic-device/pull/1 09:47:34 2076 [Router] Installing AES128 key! 09:47:34 2076 [Router] Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled. Core 1 register dump: PC : 0x400014e8 PS : 0x00060b30 A0 : 0x8011f720 A1 : 0x3ffd1560 A2 : 0x0000001a A3 : 0x00000018 A4 : 0x000000ff A5 : 0x0000ff00 A6 : 0x00ff0000 A7 : 0xff000000 A8 : 0x00000000 A9 : 0x00000008 A10 : 0x3ffd51b8 A11 : 0x3ffd175c A12 : 0x3ffd51c0 A13 : 0x3ffb28c4 A14 : 0x0000002a A15 : 0x3ffd19f0 SAR : 0x00000004 EXCCAUSE: 0x0000001c EXCVADDR: 0x00000018 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffffd

ELF file SHA256: 0000000000000000

Backtrace: 0x400014e8:0x3ffd1560 0x4011f71d:0x3ffd1570 0x401284ee:0x3ffd1880 0x4012852a:0x3ffd1910 0x400d45cd:0x3ffd1950 0x400d4757:0x3ffd1990 0x400e547e:0x3ffd19e0 0x400def71:0x3ffd1a10 0x400df11a:0x3ffd1a30 0x400df1fa:0x3ffd1a50 0x400df219:0x3ffd1a70 0x400db1b9:0x3ffd1aa0 0x400d4e62:0x3ffd1ac0 0x400f1e21:0x3ffd1ae0 0x400da5d4:0x3ffd1b10 0x401022bd:0x3ffd1b30

Rebooting...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meshtastic/Meshtastic-device/issues/811#issuecomment-847739477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXB2KTFX7HREHZO5HXFCLTPNZRHANCNFSM45OGNRUA .

IZ1IVA commented 3 years ago

firmwareVersion is 1.2.30.80e4bc6

Cheers!

michelepagot commented 3 years ago

@geeksville what is "stack dump tool"? Is it one of ...?

geeksville commented 3 years ago

@michelepagot it is bin/exception_decoder.py in this git repo (someone donated it sometime ago and I bet it came from one of those places). Usage is bin/exception_decoder.py -e elffilepath exceptionmessagefile

It warms my heart that you asked and that you might be doing more to extend/fix the device code in the future. ;-)

geeksville commented 3 years ago

(also I just noticed we aren't keeping elfs in the github artifacts - no problem for now because I can rebuild locally, but I'll update the github actions to keep elfs in a separate artifact)

geeksville commented 3 years ago

investigating, but here's the stack trace

~/development/meshtastic/meshtastic-esp32$ bin/exception_decoder.py -e .pio/build/tbeam/firmware.elf ex
stack:
0x4011f71d: _svfprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vfprintf.c:1529
0x401284ee: _vsnprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c:72
0x4012852a: vsnprintf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c:41
0x400d45cd: RedirectablePrint::vprintf(char const*, __va_list_tag) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/RedirectablePrint.cpp:37
0x400d4757: RedirectablePrint::logDebug(char const*, ...) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/RedirectablePrint.cpp:96
0x400e547e: pb_decode_from_bytes(unsigned char const*, unsigned int, pb_msgdesc_s const*, void*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/mesh-pb-constants.cpp:33
0x400def71: perhapsDecode(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df11a: Router::handleReceived(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df1fa: Router::perhapsHandleReceived(_MeshPacket*) at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400df219: Router::runOnce() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/Router.cpp:176
0x400db1b9: ReliableRouter::runOnce() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/mesh/DSRRouter.cpp:240
0x400d4e62: concurrency::OSThread::run() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/concurrency/OSThread.cpp:45
0x400f1e21: ThreadController::runOrDelay() at /home/kevinh/development/meshtastic/meshtastic-esp32/.pio/libdeps/tbeam/Thread/ThreadController.cpp:153
0x400da5d4: loop() at /home/kevinh/development/meshtastic/meshtastic-esp32/src/main.cpp:653
0x401022bd: loopTask(void*) at /home/kevinh/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:19
~/development/meshtastic/meshtastic-esp32$ 
michelepagot commented 3 years ago

it is also already documented in https://meshtastic.org/docs/software/other/build-instructions