mysensors / MySensors

MySensors library and examples
https://www.mysensors.org
1.31k stars 895 forks source link

RFM69 MQTT gateway on ESP32 - InstrFetchProhibited #1435

Open rikki78 opened 4 years ago

rikki78 commented 4 years ago

Hi,

I'm trying to setup a RFM69 MQTT gateway on an ESP32, but I'm facing random resets caused by InstrFetchProhibited according the output of the ESP. I'm using

This is my configuration:

  #define MY_RADIO_RFM69
  #define MY_RFM69_NEW_DRIVER
  #define MY_GATEWAY_ESP32
  #define MY_GATEWAY_MQTT_CLIENT

This is the log output with the exception:

(za 08-08 13:19:45.351) 10886014 TSF:MSG:READ,31-31-0,s=10,c=1,t=3,pt=1,l=1,sg=0:72
(za 08-08 13:19:45.351) 10886019 GWT:TPS:TOPIC=ms-rfm-from/31/10/1/0/3,MSG SENT
(za 08-08 13:19:54.830) 10895837 TSF:MSG:READ,31-31-0,s=22,c=1,t=23,pt=1,l=1,sg=0:18
(za 08-08 13:19:54.939) 10895842 GWT:TPS:TOPIC=ms-rfm-from/31/22/1/0/23,MSG SENT
(za 08-08 13:20:09.827) Guru Meditation Error: Core  0 panic'ed (InstrFetchProhibited). Exception was unhandled.<CR>
(za 08-08 13:20:09.934) Core 0 register dump:<CR>
(za 08-08 13:20:09.934) PC      : 0x00400825  PS      : 0x00060031  A0      : 0x8008a387  A1      : 0x3ffbe160  <CR>
(za 08-08 13:20:09.934) A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00060a20  A5      : 0x00000020  <CR>
(za 08-08 13:20:09.934) A6      : 0x00000020  A7      : 0x3ffb45e0  A8      : 0x80082064  A9      : 0x3ffc180c  <CR>
(za 08-08 13:20:09.934) A10     : 0x00000003  A11     : 0x00060023  A12     : 0x00060021  A13     : 0x3ffb974c  <CR>
(za 08-08 13:20:09.934) A14     : 0x0000002a  A15     : 0x3ffb45b0  SAR     : 0x00000014  EXCCAUSE: 0x00000014  <CR>
(za 08-08 13:20:09.934) EXCVADDR: 0x00400824  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  <CR>
(za 08-08 13:20:09.934) Core 0 was running in ISR context:<CR>
(za 08-08 13:20:09.934) EPC1    : 0x00400825  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x00000000<CR>
(za 08-08 13:20:09.934) <CR>

The stack decoder shows this:


0x4008a384: xTaskIncrementTick at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
0x40088c7f: xPortSysTickHandler at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
0x4008b395: _frxt_timer_int at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/portasm.S:303
0x4008478e: _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
0x4013fdd7: esp_pm_impl_waiti at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/pm_esp32.c:492
0x400d99c3: esp_vApplicationIdleHook at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/freertos_hooks.c:86
0x4008a375: prvIdleTask at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
0x40088b91: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

I have a similar gateway but with an NRF24 module, and this works fine with the same MS release 2.3.2. So I suspect it could be module related.

Any hints to solve or debug this?

rikki78 commented 4 years ago

I've added some more defines to see more debug output. At first, the gateway worked fine when I had all verbose debugs active.

#define MY_DEBUG_VERBOSE_RFM69
#define MY_DEBUG_VERBOSE_TRANSPORT
#define MY_DEBUG_VERBOSE_GATEWAY
#define MY_DEBUG_VERBOSE_TRANSPORT_HAL

Now I've only the following active:

#define MY_DEBUG_VERBOSE_GATEWAY
#define MY_DEBUG_VERBOSE_TRANSPORT_HAL

Again the gateway panics after a while:

2645130 GWT:TPS:TOPIC=ms-rfm-from/31/22/1/0/23,MSG SENT
Guru Meditation Error: Core  1 panic'ed (InstrFetchProhibited). Exception was unhandled.<CR>
Core 1 register dump:<CR>
PC      : 0x00400825  PS      : 0x00060031  A0      : 0x8008a387  A1      : 0x3ffbe760  <CR>
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000000  A5      : 0x0000ee00  <CR>
A6      : 0x00050e23  A7      : 0x3ffb8058  A8      : 0x80082064  A9      : 0x3ffc184c  <CR>
A10     : 0x00000003  A11     : 0x00060023  A12     : 0x00060021  A13     : 0x00000030  <CR>
A14     : 0x00000015  A15     : 0x00060e23  SAR     : 0x00000016  EXCCAUSE: 0x00000014  <CR>
EXCVADDR: 0x00400824  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  <CR>
Core 1 was running in ISR context:<CR>
EPC1    : 0x00400825  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40086a3d<CR>
<CR>
Backtrace: 0x00400825:0x3ffbe760 0x4008a384:0x3ffbe780 0x40088c7f:0x3ffbe7a0 0x4008b395:0x3ffbe7c0 0x4008478e:0x3ffbe7d0 0x4000c3f0:0x3ffb1ee0 0x400d1e5f:0x3ffb1f00 0x400d2ed2:0x3ffb1f50 0x400d30c7:0x3ffb1f70 0x400d3628:0x3ffb1f90 0x400d36f1:0x3ffb1fb0 0x40088b91:0x3ffb1fd0<CR>
<CR>
Rebooting...<CR>

The stack decoded:

0x40088c7f: xPortSysTickHandler at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
0x4008b395: _frxt_timer_int at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/portasm.S:303
0x4008478e: _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
0x400d1e5f: PubSubClient::loop() at C:\Users\rik\Dropbox\Shared with Home\MS_GW default/.pio/libdeps/esp32-rfm69/MySensors/core/MyMessage.cpp:212
0x400d2ed2: gatewayTransportAvailable() at C:\Users\rik\Dropbox\Shared with Home\MS_GW default/.pio/libdeps/esp32-rfm69/MySensors/core/MyMessage.cpp:212
0x400d30c7: gatewayTransportProcess() at C:\Users\rik\Dropbox\Shared with Home\MS_GW default/.pio/libdeps/esp32-rfm69/MySensors/core/MyMessage.cpp:212
0x400d3628: _process() at C:\Users\rik\Dropbox\Shared with Home\MS_GW default/.pio/libdeps/esp32-rfm69/MySensors/core/MyMessage.cpp:212
0x400d36f1: loopTask(void*) at C:\Users\rik\Dropbox\Shared with Home\MS_GW default/.pio/libdeps/esp32-rfm69/MySensors/core/MyMessage.cpp:212
0x40088b91: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

See the below debug output with comments added. The controller is sending a test message every 5 seconds over MQTT. A test node is sending another message every 10 seconds. It looks like there are MQTT messages missing, just before the issue appears. Can this be related somehow?


(za 15-08 20:51:38.981) 2625431 THA:DATA:AVAIL                            Message from node every 10 seconds
(za 15-08 20:51:39.076) 2625486 THA:RCV:MSG=1F1F000A21030A06
(za 15-08 20:51:39.076) 2625489 THA:RCV:MSG LEN=8
(za 15-08 20:51:39.076) 2625494 TSF:MSG:READ,31-31-0,s=10,c=1,t=3,pt=1,l=1,sg=0:6
(za 15-08 20:51:39.076) 2625501 GWT:TPS:TOPIC=ms-rfm-from/31/10/1/0/3,MSG SENT
(za 15-08 20:51:43.551) 2630003 GWT:IMQ:TOPIC=ms-rfm-to/31/2/2/0/3, MSG RECEIVED  Message from controller every 5 seconds
(za 15-08 20:51:43.595) 2630008 THA:SND:MSG=00001F0A02030237
(za 15-08 20:51:43.666) 2630026 THA:SND:MSG LEN=8,RES=1
(za 15-08 20:51:43.666) 2630028 TSF:MSG:SEND,0-0-31-31,s=2,c=2,t=3,pt=0,l=1,sg=0,ft=0,st=OK:7
(za 15-08 20:51:43.798) 2630250 THA:DATA:AVAIL
(za 15-08 20:51:43.898) 2630305 THA:RCV:MSG=1F1F000A21030207
(za 15-08 20:51:43.898) 2630308 THA:RCV:MSG LEN=8
(za 15-08 20:51:43.898) 2630313 TSF:MSG:READ,31-31-0,s=2,c=1,t=3,pt=1,l=1,sg=0:7
(za 15-08 20:51:43.898) 2630320 GWT:TPS:TOPIC=ms-rfm-from/31/2/1/0/3,MSG SENT
(za 15-08 20:51:43.898) 2630335 THA:DATA:AVAIL
(za 15-08 20:51:43.940) 2630391 THA:RCV:MSG=1F1F000A21030A07
(za 15-08 20:51:43.944) 2630394 THA:RCV:MSG LEN=8
(za 15-08 20:51:43.949) 2630399 TSF:MSG:READ,31-31-0,s=10,c=1,t=3,pt=1,l=1,sg=0:7
(za 15-08 20:51:43.956) 2630406 GWT:TPS:TOPIC=ms-rfm-from/31/10/1/0/3,MSG SENT
(za 15-08 20:51:44.190) 2630642 THA:DATA:AVAIL
(za 15-08 20:51:44.272) 2630723 THA:RCV:MSG=1F1F000A2117161F
(za 15-08 20:51:44.318) 2630726 THA:RCV:MSG LEN=8
(za 15-08 20:51:44.347) 2630731 TSF:MSG:READ,31-31-0,s=22,c=1,t=23,pt=1,l=1,sg=0:31
(za 15-08 20:51:44.347) 2630738 GWT:TPS:TOPIC=ms-rfm-from/31/22/1/0/23,MSG SENT
(za 15-08 20:51:48.562) 2635014 GWT:IMQ:TOPIC=ms-rfm-to/31/2/2/0/3, MSG RECEIVED  Message from controller every 5 seconds
(za 15-08 20:51:48.663) 2635019 THA:SND:MSG=00001F0A02030238
(za 15-08 20:51:48.663) 2635037 THA:SND:MSG LEN=8,RES=1
(za 15-08 20:51:48.663) 2635039 TSF:MSG:SEND,0-0-31-31,s=2,c=2,t=3,pt=0,l=1,sg=0,ft=0,st=OK:8
(za 15-08 20:51:48.801) 2635254 THA:DATA:AVAIL                               Message from node every 10 seconds
(za 15-08 20:51:48.946) 2635309 THA:RCV:MSG=1F1F000A21030208
(za 15-08 20:51:48.946) 2635312 THA:RCV:MSG LEN=8
(za 15-08 20:51:48.946) 2635317 TSF:MSG:READ,31-31-0,s=2,c=1,t=3,pt=1,l=1,sg=0:8
(za 15-08 20:51:48.946) 2635324 GWT:TPS:TOPIC=ms-rfm-from/31/2/1/0/3,MSG SENT
(za 15-08 20:51:48.946) 2635338 THA:DATA:AVAIL
(za 15-08 20:51:48.946) 2635394 THA:RCV:MSG=1F1F000A21030A08
(za 15-08 20:51:48.997) 2635397 THA:RCV:MSG LEN=8
(za 15-08 20:51:48.997) 2635402 TSF:MSG:READ,31-31-0,s=10,c=1,t=3,pt=1,l=1,sg=0:8
(za 15-08 20:51:48.997) 2635409 GWT:TPS:TOPIC=ms-rfm-from/31/10/1/0/3,MSG SENT
                              Somewhere a message from the controller should show up
(za 15-08 20:51:58.608) 2645060 THA:DATA:AVAIL                                Message from node every 10 seconds
(za 15-08 20:51:58.695) 2645115 THA:RCV:MSG=1F1F000A21171620
(za 15-08 20:51:58.695) 2645118 THA:RCV:MSG LEN=8
(za 15-08 20:51:58.695) 2645123 TSF:MSG:READ,31-31-0,s=22,c=1,t=23,pt=1,l=1,sg=0:32
(za 15-08 20:51:58.695) 2645130 GWT:TPS:TOPIC=ms-rfm-from/31/22/1/0/23,MSG SENT
                              Somewhere a message from the controller should show up
(za 15-08 20:52:08.744) Guru Meditation Error: Core  1 panic'ed (InstrFetchProhibited). Exception was unhandled.<CR>
(za 15-08 20:52:08.794) Core 1 register dump:<CR>
(za 15-08 20:52:08.884) PC      : 0x00400825  PS      : 0x00060031  A0      : 0x8008a387  A1      : 0x3ffbe760  <CR>
(za 15-08 20:52:08.884) A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000000  A5      : 0x0000ee00  <CR>
(za 15-08 20:52:08.884) A6      : 0x00050e23  A7      : 0x3ffb8058  A8      : 0x80082064  A9      : 0x3ffc184c  <CR>
(za 15-08 20:52:08.884) A10     : 0x00000003  A11     : 0x00060023  A12     : 0x00060021  A13     : 0x00000030  <CR>
(za 15-08 20:52:08.884) A14     : 0x00000015  A15     : 0x00060e23  SAR     : 0x00000016  EXCCAUSE: 0x00000014  <CR>
(za 15-08 20:52:08.884) EXCVADDR: 0x00400824  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  <CR>
(za 15-08 20:52:08.884) Core 1 was running in ISR context:<CR>
(za 15-08 20:52:08.884) EPC1    : 0x00400825  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40086a3d<CR>
(za 15-08 20:52:08.884) <CR>
(za 15-08 20:52:08.884) Backtrace: 0x00400825:0x3ffbe760 0x4008a384:0x3ffbe780 0x40088c7f:0x3ffbe7a0 0x4008b395:0x3ffbe7c0 0x4008478e:0x3ffbe7d0 0x4000c3f0:0x3ffb1ee0 0x400d1e5f:0x3ffb1f00 0x400d2ed2:0x3ffb1f50 0x400d30c7:0x3ffb1f70 0x400d3628:0x3ffb1f90 0x400d36f1:0x3ffb1fb0 0x40088b91:0x3ffb1fd0<CR>
(za 15-08 20:52:08.884) <CR>
rikki78 commented 4 years ago

I've added a check in the PubSubclient on a valid message header, which disregards the message and reconnects the client. After that workaround I haven't seen this issue anymore.