plasmapper / modbus-esp-cpp

Modbus C++ Class Component for ESP-IDF
MIT License
5 stars 1 forks source link

Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0) when running as FreeRTOS task #2

Closed MichaelUray closed 1 year ago

MichaelUray commented 1 year ago

When I run the example task as single application, then it works fine, but when I start it together with other tasks (e.g. MQTT) then it resets with the following message.

Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)

I (881) count_task: RAM left 185768, 
 nvs_main_th: 1384,
 server_task_th: 1304,
 server_handle_task_th: 2384, 
 count_task_th: 1148,
 mb_master_main_th: 18900,
 mb_slave_main_th: 2864,
 mqtt_ssl_main_th: 1148
I (881) uart: queue free spaces: 20
I (881) main: Start MQTT SSL
I (901) SLAVE_TEST: Modbus slave stack initialized.
I (911) SLAVE_TEST: Start modbus test...
I (921) MQTTS_EXAMPLE: [APP] Startup..
I (921) MQTTS_EXAMPLE: [APP] Free memory: 165916 bytes
I (931) MQTTS_EXAMPLE: [APP] IDF version: v4.4.2
I (931) MQTTS_EXAMPLE: [APP] Free memory: 165876 bytes
I (941) MQTTS_EXAMPLE: Other event id:7
E (941) esp-tls: couldn't get hostname for :mqtt.energy.uray.io: getaddrinfo() returns 202, addrinfo=0x0
E (951) esp-tls: Failed to open new connection
E (961) TRANSPORT_BASE: Failed to open a new connection
E (961) MQTT_CLIENT: Error transport connect
I (971) MQTTS_EXAMPLE: MQTT_EVENT_ERROR
I (971) MQTTS_EXAMPLE: Last error code reported from esp-tls: 0x8001
I (981) MQTTS_EXAMPLE: Last tls stack error number: 0x0
I (991) MQTTS_EXAMPLE: Last captured errno : 0 (Success)
I (991) MQTTS_EXAMPLE: MQTT_EVENT_DISCONNECTED
E (1181) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_base.cpp', line: 114, function: esp_err_t PL::ModbusBase::ReadFrame(PL::Stream&, uint8_t&, PL::ModbusFunctionCode&, size_t&, uint16_t&)
E (1191) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_client.cpp', line: 305, function: 
esp_err_t PL::ModbusClient::Command(PL::ModbusFunctionCode, size_t, size_t&, PL::ModbusException*)
E (1211) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_client.cpp', line: 373, function: 
esp_err_t PL::ModbusClient::ReadRegisters(PL::ModbusFunctionCode, uint16_t, uint16_t, uint16_t*, PL::ModbusException*)
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0). 

Core  0 register dump:
PC      : 0x401697da  PS      : 0x00060c34  A0      : 0x800d3b99  A1      : 0x3ffbbae0
0x401697da: esp_pm_impl_waiti at C:/Users/Michael.Uray/esp/esp-idf/components/esp_pm/pm_impl.c:839

A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000001  A5      : 0x80000001  
A6      : 0x3ffb3ba0  A7      : 0x3ffb3ba0  A8      : 0x800eb5a2  A9      : 0x3ffbbab0
A10     : 0x00000000  A11     : 0x40001d48  A12     : 0x00060720  A13     : 0x00060723
A14     : 0x00000001  A15     : 0x00000001  SAR     : 0x0000001e  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace:0x401697d7:0x3ffbbae00x400d3b96:0x3ffbbb00 0x4008adb9:0x3ffbbb20 0x4008c9a5:0x3ffbbb40 
0x401697d7: cpu_ll_waiti at C:/Users/Michael.Uray/esp/esp-idf/components/hal/esp32/include/hal/cpu_ll.h:183
 (inlined by) esp_pm_impl_waiti at C:/Users/Michael.Uray/esp/esp-idf/components/esp_pm/pm_impl.c:837

0x400d3b96: esp_vApplicationIdleHook at C:/Users/Michael.Uray/esp/esp-idf/components/esp_system/freertos_hooks.c:63

0x4008adb9: prvIdleTask at C:/Users/Michael.Uray/esp/esp-idf/components/freertos/tasks.c:3973 (discriminator 1)

0x4008c9a5: vPortTaskWrapper at C:/Users/Michael.Uray/esp/esp-idf/components/freertos/port/xtensa/port.c:131

ELF file SHA256: d6bff7793b423126

Rebooting...

Sometimes it also reports the following message before a reboot:

Guru Meditation Error: Core 0 panic'ed (InstructionFetchError). Exception was unhandled.

I (881) count_task: starting task
I (881) count_task: RAM left 185768, 
 nvs_main_th: 1384,
 server_task_th: 1304, 
 server_handle_task_th: 2384,
 count_task_th: 1068,
 mb_master_main_th: 18900,
 mb_slave_main_th: 2864,
 mqtt_ssl_main_th: 1068
I (881) uart: queue free spaces: 20
I (881) main: Start MQTT SSL
I (911) SLAVE_TEST: Modbus slave stack initialized.
I (911) SLAVE_TEST: Start modbus test...
I (921) MQTTS_EXAMPLE: [APP] Startup..
I (921) MQTTS_EXAMPLE: [APP] Free memory: 165916 bytes
I (931) MQTTS_EXAMPLE: [APP] IDF version: v4.4.2
I (931) MQTTS_EXAMPLE: [APP] Free memory: 165876 bytes
I (941) MQTTS_EXAMPLE: Other event id:7
E (951) esp-tls: couldn't get hostname for :mqtt.energy.uray.io: getaddrinfo() returns 202, addrinfo=0x0
E (951) esp-tls: Failed to open new connection
E (961) TRANSPORT_BASE: Failed to open a new connection
E (971) MQTT_CLIENT: Error transport connect
I (971) MQTTS_EXAMPLE: MQTT_EVENT_ERROR
I (981) MQTTS_EXAMPLE: Last error code reported from esp-tls: 0x8001
I (981) MQTTS_EXAMPLE: Last tls stack error number: 0x0
I (991) MQTTS_EXAMPLE: Last captured errno : 0 (Success)
I (991) MQTTS_EXAMPLE: MQTT_EVENT_DISCONNECTED
E (1181) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_base.cpp', line: 114, function: esp_err_t PL::ModbusBase::ReadFrame(PL::Stream&, uint8_t&, PL::ModbusFunctionCode&, size_t&, uint16_t&)
E (1191) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_client.cpp', line: 305, function: 
esp_err_t PL::ModbusClient::Command(PL::ModbusFunctionCode, size_t, size_t&, PL::ModbusException*)
E (1211) PL_RETURN_ON_ERROR: 0x107 (ESP_ERR_TIMEOUT), file: './components/modbus-esp-cpp/pl_modbus_client.cpp', line: 373, function: 
esp_err_t PL::ModbusClient::ReadRegisters(PL::ModbusFunctionCode, uint16_t, uint16_t, uint16_t*, PL::ModbusException*)
Guru Meditation Error: Core  0 panic'ed (InstructionFetchError). Exception was unhandled.

Core  0 register dump:
PC      : 0x3ffbab1c  PS      : 0x00060030  A0      : 0x800dafa6  A1      : 0x3ffd0ba0  
A2      : 0x3ffd0bdc  A3      : 0x3f408d08  A4      : 0x3f408d1c  A5      : 0x000004bb
A6      : 0x3f408d08  A7      : 0x00000107  A8      : 0x801696a4  A9      : 0x3ffd0b80
A10     : 0x3ffb4088  A11     : 0x3f408d08  A12     : 0x3f408d1c  A13     : 0x3ffd0ba0  
A14     : 0x3ffd0b80  A15     : 0x0000000c  SAR     : 0x0000001d  EXCCAUSE: 0x00000002
EXCVADDR: 0x3ffbab1c  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xffffffe2  

Backtrace:0x3ffbab19:0x3ffd0ba0 |<-CORRUPTED

ELF file SHA256: d6bff7793b423126

Rebooting...

I have tried to start four just very simple tasks, but then it does not happen, for example like this:

void task4(void *pvParameters)
{
  ESP_LOGI("TASK4:", "Start");
  while (1)
  {
    ESP_LOGI("TASK4:", "Loop");
    vTaskDelay(100 / portTICK_PERIOD_MS);
  }
}

extern "C" void app_main(void)
{
  ESP_LOGI("Main:", "Start");
  xTaskCreate(&task1, "task1", 4000, NULL, 9, NULL);
  xTaskCreate(&task2, "task2", 4000, NULL, 9, NULL);
  xTaskCreate(&task3, "task3", 4000, NULL, 9, NULL);
  xTaskCreate(&task4, "task4", 4000, NULL, 9, NULL);

  xTaskCreate(&mb_server_main, "mb_server_main", 4000, NULL, 9, NULL);
}

But if I start more complex tasks like this, then this problem occurs.

void app_main()
{
  const static char *TAG = "main";

  esp_err_t err = nvs_flash_init();
  if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND)
  {
    ESP_ERROR_CHECK(nvs_flash_erase());
    err = nvs_flash_init();
  }
  ESP_ERROR_CHECK(err);

  wifi_app_main();

  led_setup();
  ws_server_start();

  xTaskCreate(&nvs_main, "nvs_main", 3000, NULL, 9, &nvs_main_th);
  xTaskCreate(&server_task, "server_task", 3000, NULL, 9, &server_task_th);
  xTaskCreate(&server_handle_task, "server_handle_task", 4000, NULL, 6, &server_handle_task_th);
  xTaskCreate(&count_task, "count_task", 3000, NULL, 2, &count_task_th);
  xTaskCreate(&mb_master_main, "mb_master_main", 20000, NULL, 9, &mb_master_main_th);
  xTaskCreate(&mb_slave_main, "mb_slave_main", 3500, NULL, 9, &mb_slave_main_th);
  xTaskCreate(&mqtt_ssl_main, "mqtt_ssl_main", 1024 * 8, NULL, 9, &mqtt_ssl_main_th);
}

Even mqtt_ssl_main() and mb_master_main() causes this problem, but also mb_master_main() together with some other tasks.

What could casue this problem?

MichaelUray commented 1 year ago

but also mb_master_main() together with some other tasks

I have to revise that, during my further tests I was only able to see this error when MQTT and Modbus run at the same time, but not together with any of my other tasks. So I think the issue is a combination of MQTT and Modbus communication at the same time in two separate tasks.

What could cause that and how to isolate the issue?

plasmapper commented 1 year ago

I am not quite sure which example you are talking about. Is it Modbus server or Modbus client. Is it serial or network?

MichaelUray commented 1 year ago

Sorry, I actually did not describe it exactly, I meant the Modbus serial RTU client and I used the ESP IDF MQTT SSL example which I modified a little bit.

plasmapper commented 1 year ago

Can you please show the tasks' code?

MichaelUray commented 1 year ago

I have attached the full project there. The relevant file ist mb_master.cpp and mqtt_ssl.c. GK_PV.zip

plasmapper commented 1 year ago

For now it does not seem to me that the problem is with the library itself. Since you are using many tasks with a lot of memory reserved for them it might be anything from wrong task priorities to memory corruption (according to errors that you get). Please do some more debugging and write if you are sure that the problem is directly connected to the library functionality.

MichaelUray commented 1 year ago

Since you are using many tasks with a lot of memory reserved

I did increase the memory a lot just to make sure to not to run into some stack issues like I already had.

According to the task list I don't need that much and everything was running before with less stack as well.

count_task      X       9       2872    15
IDLE            R       0       1104    5
IDLE            R       0       1112    6
mb_master_main  B       9       6000    16
main            B       1       1688    4
nvs_main        B       9       4392    12
tiT             B       18      2824    8
server_handle_t B       6       6384    14
ipc1            B       24      1060    2
uart_queue_task B       10      3616    19
ipc0            B       24      1060    1
modbus_slave_ta B       9       3548    18
mb_slave_main   B       9       5972    17
Tmr Svc         B       1       1568    7
sys_evt         B       20      1924    9
wifi            B       23      4664    10
esp_timer       S       22      3392    3
ws_server_task  B       5       5552    11
server_task     B       9       4312    13

It also shows me plenty of RAM left, for that reason I was not worried to assigned too much RAM.

I (10104) count_task: RAM left 161144

Please do some more debugging

I am pretty much a beginner with FreeRTOS as well as C++, but I have some C programming experience. Any recommendation where to start off with debugging?

Do you actually use interrupts in your communication which could led to a Interrupt wdt timeout?

anything from wrong task priorities

Could it be an issue to run all tasks in the same priority? From my FreeRTOS understanding it should not matter which priority a task has. I have already tried to change some of them, but it did not help.

plasmapper commented 1 year ago

You have two global variables named "client" (in mb_master.cpp and in mqtt_ssl.c). Make them local or rename them or do something else to make sure that they do not interfere with each other.

MichaelUray commented 1 year ago

Oh boy, you are absolutely right, this was causing the problem. I was looking for days for this issue, but I did not figure it out, thank you very much for your help!

I understand now, that I have to declare it as static, if I want to limit the scope to the file. But why doesn't the compiler throw an error, if I declare the same variable in two files with different data types?

plasmapper commented 1 year ago

I don't know why there is no error. I don't have much experience in mixing C and C++ in one program, but maybe it has something to do with this.

MichaelUray commented 1 year ago

It looks as if the behavior is undefined, if a declaration is done like that withour static in each (or probably at least one) file. https://stackoverflow.com/questions/74412226/how-throw-a-gcc-error-if-a-global-variable-with-the-same-name-gets-declared-twi

MichaelUray commented 1 year ago

https://stackoverflow.com/questions/74412226/how-throw-a-gcc-error-if-a-global-variable-with-the-same-name-gets-declared-twi

The solution is to add -fno-common as a compiler option, which was not default prior to GCC 10. This would raise an error from the compiler in such a situation.

The current ESP IDF Version 4.4.2 / toolchain uses GCC 8.4 which has this option not as default. Add this option to CMakeLists.txt: idf_build_set_property(COMPILE_OPTIONS "-fno-common" APPEND)