meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
2.98k stars 714 forks source link

[Bug]: Pico W "integer too large" then crash #4000

Closed clwgh closed 4 weeks ago

clwgh commented 1 month ago

Category

Other

Hardware

Raspberry Pi Pico (W)

Firmware Version

2.3.11.77cf5c6 (was available nightly on 28th May)

Description

FAO @thebentern. On Discord you wrote " I'm enlisting your help to test this 2.3.11 alpha preview before we go live with it". I'm mentioning this as a possible bug; it may not be, but here's the info in case it's useful. Feel free to close if not.

Firmware 2.3.11.77cf5c6 which was nightly on 28th May when logging started. I was actually testing to see if the memory leak bug fix had solved the Pico W crashing. It looks like that bug is fixed. However this version did also crash after being up for 52h 41m.

The log is ~762,000 lines long and 4 seconds before the node crashed there is an error which is the only instance in the entire log.

ERROR | 05:14:47 189666 [Router] Can't decode protobuf reason='integer too large', pb_msgdesc 0x100a26b8

I've put on a fresh instance of 2.3.11.cd8a7e4 which was current as of around 8 hours ago and am logging again and will report if it happens again. I'm not sure whether

a) something malformed was received and not handled b) a similar bug to the previous memory leak bug which resulted in bad data being read then a crash c) the previous memory leak bug still happening albeit much slower with that display fix in place

Relevant log output

No response

thebentern commented 1 month ago

We may need to add some more detailed logging to hunt this one down. There are a lot of ints this could potentially be.

clwgh commented 1 month ago

Could this be added to the firmware as a settable option? Maybe something like the below which would be false by default. When set to true for debugging purposes, a load more info is logged, such as free memory, and anything else useful for this kind of fault-finding.

device.debug_logging true

Can I make use of these options recently added?

RP2040: Add getFreeHeap() and getHeapSize() support by @GUVWAF in https://github.com/meshtastic/firmware/pull/3890

Happy to test anything you want, custom firmware, etc.

Thanks for your help and work on the firmware.

clwgh commented 4 weeks ago

Update

The node has been up for 95 hours on nightly firmware 2.3.11.cd8a7e4 and has not experienced another instance of this error, so I'm going to close this. I don't know what the cause was.

It may have been unique to earlier nightly firmware 2.3.11.77cf5c6 or it may have been a coincidence and could happen with any firmware. Suggestions welcomed, what is this error saying? This integer too large string does not appear in any of the other 230 hours of logs I have saved.

I am going to put on now the alpha firmware 2.3.11.2740a56 and try again. If I see this error again I will reopen this or start a new report and reference this.

Regarding the earlier comment, @thebentern, is this viable do you think, to be able to add a debug logging option and get free memory reporting, and anything else relevant to memory leaks, into the logs? I can open as a separate feature request issue if you'd like, but sounding out here first to save your time.

Many thanks, Chris