knolleary / pubsubclient

A client library for the Arduino Ethernet Shield that provides support for MQTT.
http://pubsubclient.knolleary.net/
MIT License
3.82k stars 1.47k forks source link

Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited) on client.loop(); #751

Open valki2 opened 4 years ago

valki2 commented 4 years ago

Hi folks, dear knolleary, maybe someone can point me into the right direction. I got a dump and I think it comes from this library.

hardware: esp32 esp32 core: idf-release/v4.0 (d2761a9e) esptool 2.6 pubsubclient 2.7.0 (as 2.8 was more instable)

The Dump:

Guru Meditation Error: Core  1 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x00400832  PS      : 0x00060031  A0      : 0x8008b11f  A1      : 0x3ffbe760  
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000001  A5      : 0x00005d40  
A6      : 0x4affffe7  A7      : 0x00000000  A8      : 0x80082df0  A9      : 0x3ffc50c4  
A10     : 0x00000003  A11     : 0x00060023  A12     : 0x00060021  A13     : 0x0000003a  
A14     : 0x00000252  A15     : 0x00000000  SAR     : 0x00000018  EXCCAUSE: 0x00000014  
EXCVADDR: 0x00400830  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  
Core 1 was running in ISR context:
EPC1    : 0x00400832  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x400877cb

Backtrace: 0x00400832:0x3ffbe760 0x4008b11c:0x3ffbe780 0x40089a07:0x3ffbe7a0 0x4008c151:0x3ffbe7c0 0x40085516:0x3ffbe7d0 0x4000c3eb:0x3ffb1ed0 0x400e9b93:0x3ffb1ef0 0x400e520a:0x3ffb1f40 0x400e5277:0x3ffb1f80 0x40103f6d:0x3ffb1fb0 0x40089919:0x3ffb1fd0

According to https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html#guru-meditation-errors "InstrFetchProhibited" may have to do with a call to a function pointer that is pointing to an adress in the RAM or ROM where no valid code is (anymore).

The exception decoder shows:

PC: 0x00400832
EXCVADDR: 0x00400830
Decoding stack results
0x4008b11c: xTaskIncrementTick at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c line 2473
0x40089a07: xPortSysTickHandler at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 289
0x400e9b93: PubSubClient::loop() at C:\Users\xxx\Documents\Arduino\libraries\PubSubClient\src\PubSubClient.cpp line 330
0x400e520a: getMQTT() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 2116
0x400e5277: loop() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 591
0x40103f6d: loopTask(void*) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\main.cpp line 19
0x40089919: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

The relevant call of line 2116 is where client.loop() is called.

void getMQTT() {
  client.setServer(mqtt_server.value.c_str(), mqtt_port.value.toInt());
  /* this receivedCallback function will be invoked when client received subscribed topics */
  client.setCallback(receivedCallback);
  /* this function will listen for incoming subscribed topic-process-invoke receivedCallback */
  if (!client.connected()) {
    reconnect();     
  }
  client.loop(); <-------- this is 2116 
}

It would be nice if anyone got an idea how to further hunt that problem?

Any hints highly appreciated,

cheers!

console-netsphere commented 4 years ago

Maybe WiFiClient or WiFiClientSecure is null or nullptr when the client.loop() is called. It happened to me on HTTPClient library when setting reuse to false.

valki2 commented 4 years ago

Good point... Ill rewrite that part and test it!

valki2 commented 4 years ago

Update: Double checked that client.loop() cannot be called when WifiClient is down. Huge increase in stability!

Then I got funny and wanted to upgrade PubSubClient from 2.7 to 2.8.

Result is that I get again random restarts of the ESP32. This time its a "StoreProhibited" Problem pointing to EXCVADDR: 0x003ffb80.

Think I have to downgrade again...

Guru Meditation Error: Core  1 panic'ed (StoreProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x4008a926  PS      : 0x00060233  A0      : 0x8008c7ba  A1      : 0x3ffb1cd0  
A2      : 0x003ffb80  A3      : 0x0000abab  A4      : 0xb33fffff  A5      : 0x00000001  
A6      : 0x00060020  A7      : 0x0000cdcd  A8      : 0x0000abab  A9      : 0x3ffb1dc0  
A10     : 0x00000000  A11     : 0x00000020  A12     : 0x40085cbc  A13     : 0xed8d0f6e  
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x0000000a  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x003ffb80  LBEG    : 0x400012c5  LEND    : 0x400012d5  LCOUNT  : 0xfffffffb  

Backtrace: 0x4008a926:0x3ffb1cd0 0x4008c7b7:0x3ffb1d00 0x4008ce2b:0x3ffb1d20 0x400857b0:0x3ffb1d40 0x400858bd:0x3ffb1d60 0x40085982:0x3ffb1d80 0x40085cc5:0x3ffb1da0 0x4000bedd:0x3ffb1dc0 0x401033bb:0x3ffb1de0 0x401034f0:0x3ffb1e10 0x4010395b:0x3ffb1e30 0x40103b02:0x3ffb1e50 0x400dc8fd:0x3ffb1e70 0x40187a33:0x3ffb1ed0 0x400ea16e:0x3ffb1ef0 0x400e567a:0x3ffb1f30 0x400e576b:0x3ffb1f80 0x40104525:0x3ffb1fb0 0x40089919:0x3ffb1fd0

PC: 0x4008a926: vTaskEnterCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/include/freertos/portmacro.h line 285
EXCVADDR: 0x003ffb80

Decoding stack results
0x4008a926: vTaskEnterCritical at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/include/freertos/portmacro.h line 285
0x4008c7b7: multi_heap_internal_lock at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c line 375
0x4008ce2b: multi_heap_malloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c line 190
0x400857b0: heap_caps_malloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 111
0x400858bd: heap_caps_realloc at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 274
0x40085982: heap_caps_realloc_default at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c line 163
0x40085cc5: _realloc_r at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c line 47
0x401033bb: String::changeBuffer(unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 183
0x401034f0: String::reserve(unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 149
0x4010395b: String::concat(char const*, unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 328
0x40103b02: operator+(StringSumHelper const&, String const&) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\WString.cpp line 415
0x400dc8fd: receivedCallback(char*, unsigned char*, unsigned int) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32/WString.h line 301
0x40187a33: std::_Function_base::_Base_manager ::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) at c:\users\xxx\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-80-g6c4433a-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 1754
0x400ea16e: PubSubClient::loop() at C:\Users\xxx\Documents\Arduino\libraries\PubSubClient\src\PubSubClient.cpp line 412
0x400e567a: getMQTT() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 2253
0x400e576b: loop() at C:\Users\xxx\Documents\Arduino\openAger/openAger.ino line 589
0x40104525: loopTask(void*) at C:\Users\xxx\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\cores\esp32\main.cpp line 14
0x40089919: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
rikki78 commented 4 years ago

Interesting point, I also experience sudden restarts of the ESP32. I'm not yet sure if it is really caused by this library though.

@valki2

Update: Double checked that client.loop() cannot be called when WifiClient is down. Huge increase in stability!

Would you mind sharing this code? Is it just checking there is a connection still or something else?

rikki78 commented 4 years ago

I have been debugging and tracing this more. I have added some Print statements in the PubSubClient::loop() function like below:

...
                if (type == MQTTPUBLISH) {
                    Serial.print("MQTTP ");
                    if (callback) {
                        uint16_t tl = (buffer[llen+1]<<8)+buffer[llen+2]; /* topic length in bytes */
                        Serial.printf("TL %d\n", tl);

This results in this output on the serial monitor:

(wo 19-08 11:59:58.082) MQTTP TL 12335
(wo 19-08 11:59:58.109) Guru Meditation Error: Core  1 panic'ed (InstrFetchProhibited). Exception was unhandled.<CR>

So the topic length tl becomes really high. This is unexpected, the size should be 20 bytes only. My conclusion so far is that either wrong packets are read or the buffer is corrupt. I've added more debug statements and waiting for the issue to re-appear. I'm using v2.7 of the library, and it looked to me there are no changes on this part in 2.8 so didn't test that (yet).

Does anybody have a good idea what to do solve this point?

rikki78 commented 4 years ago

After some more logging it seems to me like a corrupt header is received, for example: expected header:

0x30 0x18 0x00 0x14

Corrupt header:

0x33 0x31 0x38 0xFF 0x00

This results in a calculated length of 51 bytes. The topic length TL though is calculated to be 0x38FF (14591 bytes) and crashes the ESP A work around could be to ignore the message if the topic length is larger then the message length. Not sure if this is the way to go. @knolleary what is your idea on this?

rwagoner commented 3 years ago

I also am experiencing stability issues on the ESP32 platform with version 2.8 of this library. I have 3 identical custom boards with the ESP32-WROOM-32D module and a HiLetgo ESP-WROOM-32 all running the same firmware. All boards will crash and reboot randomly after a a few hours. Most of the time it is an exception was unhandled error. Occasionally one will make it past a day. If I comment out the MQTT connect and loop functions the boards are stable for over 7 days uptime. The client.loop code is inside a check for WiFi.isConnected.

valki2 commented 3 years ago

@rikki78 : Sorry I missed your answer. Are you still interested in the code?

rikki78 commented 3 years ago

@valki2 yes, please. It might me or someone else :-)

valki2 commented 3 years ago

Then please sign up at https://git.valki.com and I will unlock your user quickly. Its not that public as the code may cause eye cancer... :)

Permission set!

jwk4946 commented 3 years ago

@valki2 is the code available on your repo? Is the fix as @rikki78 described above?

rikki78 commented 3 years ago

The code on the repo is different than the fix I described above. IIRC the repo checks if the WiFi is connected before connecting the mqtt. The fix described above checks on the topic length and if it is longer then the message length, the message is ignored. This prevents crashing the ESP but I'm still not sure if this solves the issue or is a workaround for an underlying issue

marcel7234 commented 3 years ago

I observed @rikki78 's issue as well: Extra 0xff bytes lead to a topic length that is way too high, then the memmove operation corrupts our heap. I managed to stabilize the code by resetting the MQTT connection in such cases. See issue #832 (discussion on Mar 20) and PR #843 .

Essayer1606 commented 9 months ago

@valki. Hello, I'm currently having the same problem as you, can you tell me how to fix it?

benny101023 commented 3 months ago

Hello I have the same problem what is the solution now?

Error code: 23:42:53.649 -> Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Exception!� 23:43:00.533 -> �ܭ��!��)�����!�a�Cǥ�Starting 23:43:00.666 -> Initializing OS 23:43:00.697 -> Resetting LMIC 23:43:00.697 -> Setting up session 23:43:00.730 -> Setup complete 23:43:00.730 -> Entering loop 23:43:00.762 -> Guru Meditation Error: Core 1 panic'ed (InstrFetchProhibited). Excepti

rwagoner commented 3 months ago

@valki. Hello, I'm currently having the same problem as you, can you tell me how to fix it?

I never found a solution and ended up switching to the 256dpi/MQTT library.