Open valki2 opened 4 years ago
Maybe WiFiClient or WiFiClientSecure is null or nullptr when the client.loop() is called. It happened to me on HTTPClient library when setting reuse to false.
Good point... Ill rewrite that part and test it!
Update: Double checked that client.loop() cannot be called when WifiClient is down. Huge increase in stability!
Then I got funny and wanted to upgrade PubSubClient from 2.7 to 2.8.
Result is that I get again random restarts of the ESP32. This time its a "StoreProhibited" Problem pointing to EXCVADDR: 0x003ffb80.
Think I have to downgrade again...
Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x4008a926 PS : 0x00060233 A0 : 0x8008c7ba A1 : 0x3ffb1cd0
A2 : 0x003ffb80 A3 : 0x0000abab A4 : 0xb33fffff A5 : 0x00000001
A6 : 0x00060020 A7 : 0x0000cdcd A8 : 0x0000abab A9 : 0x3ffb1dc0
A10 : 0x00000000 A11 : 0x00000020 A12 : 0x40085cbc A13 : 0xed8d0f6e
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x0000000a EXCCAUSE: 0x0000001d
EXCVADDR: 0x003ffb80 LBEG : 0x400012c5 LEND : 0x400012d5 LCOUNT : 0xfffffffb
Backtrace: 0x4008a926:0x3ffb1cd0 0x4008c7b7:0x3ffb1d00 0x4008ce2b:0x3ffb1d20 0x400857b0:0x3ffb1d40 0x400858bd:0x3ffb1d60 0x40085982:0x3ffb1d80 0x40085cc5:0x3ffb1da0 0x4000bedd:0x3ffb1dc0 0x401033bb:0x3ffb1de0 0x401034f0:0x3ffb1e10 0x4010395b:0x3ffb1e30 0x40103b02:0x3ffb1e50 0x400dc8fd:0x3ffb1e70 0x40187a33:0x3ffb1ed0 0x400ea16e:0x3ffb1ef0 0x400e567a:0x3ffb1f30 0x400e576b:0x3ffb1f80 0x40104525:0x3ffb1fb0 0x40089919:0x3ffb1fd0
PC: 0x4008a926: vTaskEnterCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/include/freertos/portmacro.h line 285
EXCVADDR: 0x003ffb80
Decoding stack results
0x4008a926: vTaskEnterCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/include/freertos/portmacro.h line 285
0x4008c7b7: multi_heap_internal_lock at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c line 375
0x4008ce2b: multi_heap_malloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c line 190
0x400857b0: heap_caps_malloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 111
0x400858bd: heap_caps_realloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 274
0x40085982: heap_caps_realloc_default at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 163
0x40085cc5: _realloc_r at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c line 47
0x401033bb: String::changeBuffer(unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 183
0x401034f0: String::reserve(unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 149
0x4010395b: String::concat(char const*, unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 328
0x40103b02: operator+(StringSumHelper const&, String const&) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 415
0x400dc8fd: receivedCallback(char*, unsigned char*, unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32/WString.h line 301
0x40187a33: std::_Function_base::_Base_manager ::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) at c:\users\xxx\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-80-g6c4433a-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 1754
0x400ea16e: PubSubClient::loop() at C:\Users\xxx\Documents\Arduino\libraries\PubSubClient\src\PubSubClient.cpp line 412
0x400e567a: getMQTT() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 2253
0x400e576b: loop() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 589
0x40104525: loopTask(void*) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\main.cpp line 14
0x40089919: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
Interesting point, I also experience sudden restarts of the ESP32. I'm not yet sure if it is really caused by this library though.
@valki2
Update: Double checked that client.loop() cannot be called when WifiClient is down. Huge increase in stability!
Would you mind sharing this code? Is it just checking there is a connection still or something else?
I have been debugging and tracing this more. I have added some Print statements in the PubSubClient::loop() function like below:
...
if (type == MQTTPUBLISH) {
Serial.print("MQTTP ");
if (callback) {
uint16_t tl = (buffer[llen+1]<<8)+buffer[llen+2]; /* topic length in bytes */
Serial.printf("TL %d\n", tl);
This results in this output on the serial monitor:
(wo 19-08 11:59:58.082) MQTTP TL 12335
(wo 19-08 11:59:58.109) Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Exception was unhandled.<CR>
So the topic length tl becomes really high. This is unexpected, the size should be 20 bytes only. My conclusion so far is that either wrong packets are read or the buffer is corrupt. I've added more debug statements and waiting for the issue to re-appear. I'm using v2.7 of the library, and it looked to me there are no changes on this part in 2.8 so didn't test that (yet).
Does anybody have a good idea what to do solve this point?
After some more logging it seems to me like a corrupt header is received, for example: expected header:
0x30 0x18 0x00 0x14
Corrupt header:
0x33 0x31 0x38 0xFF 0x00
This results in a calculated length of 51 bytes. The topic length TL though is calculated to be 0x38FF (14591 bytes) and crashes the ESP A work around could be to ignore the message if the topic length is larger then the message length. Not sure if this is the way to go. @knolleary what is your idea on this?
I also am experiencing stability issues on the ESP32 platform with version 2.8 of this library. I have 3 identical custom boards with the ESP32-WROOM-32D module and a HiLetgo ESP-WROOM-32 all running the same firmware. All boards will crash and reboot randomly after a a few hours. Most of the time it is an exception was unhandled error. Occasionally one will make it past a day. If I comment out the MQTT connect and loop functions the boards are stable for over 7 days uptime. The client.loop code is inside a check for WiFi.isConnected.
@rikki78 : Sorry I missed your answer. Are you still interested in the code?
@valki2 yes, please. It might me or someone else :-)
Then please sign up at https://git.valki.com and I will unlock your user quickly. Its not that public as the code may cause eye cancer... :)
Permission set!
@valki2 is the code available on your repo? Is the fix as @rikki78 described above?
The code on the repo is different than the fix I described above. IIRC the repo checks if the WiFi is connected before connecting the mqtt. The fix described above checks on the topic length and if it is longer then the message length, the message is ignored. This prevents crashing the ESP but I'm still not sure if this solves the issue or is a workaround for an underlying issue
I observed @rikki78 's issue as well: Extra 0xff bytes lead to a topic length that is way too high, then the memmove operation corrupts our heap. I managed to stabilize the code by resetting the MQTT connection in such cases. See issue #832 (discussion on Mar 20) and PR #843 .
@valki. Hello, I'm currently having the same problem as you, can you tell me how to fix it?
Hello I have the same problem what is the solution now?
Error code: 23:42:53.649 -> Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Exception!� 23:43:00.533 -> �ܭ��!��)�����!�a�Cǥ�Starting 23:43:00.666 -> Initializing OS 23:43:00.697 -> Resetting LMIC 23:43:00.697 -> Setting up session 23:43:00.730 -> Setup complete 23:43:00.730 -> Entering loop 23:43:00.762 -> Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Excepti
@valki. Hello, I'm currently having the same problem as you, can you tell me how to fix it?
I never found a solution and ended up switching to the 256dpi/MQTT library.
Hi folks, dear knolleary, maybe someone can point me into the right direction. I got a dump and I think it comes from this library.
hardware: esp32 esp32 core: idf-release/v4.0 (d2761a9e) esptool 2.6 pubsubclient 2.7.0 (as 2.8 was more instable)
The Dump:
According to https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html#guru-meditation-errors "InstrFetchProhibited" may have to do with a call to a function pointer that is pointing to an adress in the RAM or ROM where no valid code is (anymore).
The exception decoder shows:
The relevant call of line 2116 is where client.loop() is called.
It would be nice if anyone got an idea how to further hunt that problem?
Any hints highly appreciated,
cheers!