sinricpro / esp8266-esp32-sdk

Library for https://sinric.pro - simple way to connect your device to Alexa, Google Home, SmartThings and cloud
https://sinric.pro
Other
236 stars 125 forks source link

ESP32 crashing (after upgrade to 2.9.1?) #158

Closed sgrizzi closed 3 years ago

sgrizzi commented 3 years ago

Hi, I am facing a very weird situation and I need some help to find the problem(s). Again, if I disable ssl then everything is running fine.

After upgrading my code to 2.9.1 - successfully, with your support, I decided to add two more temperature sensors to report the external temperature of two airco units. Before the implementation of this change, the code had been running continuously with no problems for a couple of days (with ssl active). The setup is now including:

Every ca. 10m the code updates the data of the 4 airco units and then the data of two new temperature sensors: void checkThermo(void){ checkThermo_and_fan(thermo1.thermo_ID, thermo1.fan_ID); checkThermo_and_fan(thermo2.thermo_ID, thermo2.fan_ID); checkThermo_and_fan(thermo3.thermo_ID, thermo3.fan_ID); checkThermo_and_fan(thermo4.thermo_ID, thermo4.fan_ID); handleAircoTemperaturesensor(); //transmit external temperature data }

void handleAircoTemperaturesensor() { //send roof and garden temperatures as collected by the airco external units SinricProTemperaturesensor &mySensor = SinricPro[ROOF_TEMP_SENSOR_ID]; // get temperaturesensor device bool success = mySensor.sendTemperatureEvent(thermo1.otemp); // send event if (success) { // if event was sent successfuly, print temperature and humidity to serial Serial.printf("Roof temperature: %2.1f Celsius\r\n", thermo1.otemp); } else { // if sending event failed, print error message Serial.printf("Something went wrong...could not send Event to server!\r\n"); } SinricProTemperaturesensor &mySensor1 = SinricPro[GARDEN_TEMP_SENSOR_ID]; // get temperaturesensor device bool success1 = mySensor1.sendTemperatureEvent(thermo4.otemp); // send event if (success1) { // if event was sent successfuly, print temperature and humidity to serial Serial.printf("Garden temperature: %2.1f Celsius\r\n", thermo4.otemp); } else { // if sending event failed, print error message Serial.printf("Something went wrong...could not send Event to server!\r\n"); } }

The code crashes somewhere after the sendTemperatureEvent(), with an error that is not really deterministic…

this is the first error as seen on the serial output:

Roof temperature: 10.0 Celsius Garden temperature: 11.0 Celsius abort() was called at PC 0x401deef7 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x4008f438:0x3ffe2760 0x4008f6b5:0x3ffe2780 0x401deef7:0x3ffe27a0 0x401def3e:0x3ffe27c0 0x401de547:0x3ffe27e0 0x401de94e:0x3ffe2800 0x400db1a6:0x3ffe2820 0x400da51e:0x3ffe2910 0x400f0f75:0x3ffe2960 0x400ea9ae:0x3ffe2980 0x4009071a:0x3ffe29b0

then the code reboots and crashes the same way but with this error:

Roof temperature: 10.0 Celsius Garden temperature: 11.0 Celsius Disconnected from SinricPro Fan 603f6313fbb434731e289160 speed changed to A [E][WiFiClientSecure.cpp:127] connect(): start_ssl_client: -1 [E][ssl_client.cpp:36] _handle_error(): [start_ssl_client():216]: (-10368) X509 - Allocation of memory failed [E][WiFiClientSecure.cpp:127] connect(): start_ssl_client: -10368 [E][WiFiClientSecure.cpp:127] connect(): start_ssl_client: -1

sometimes it returns an unknown (?) error from ssl_client.cpp Roof temperature: 10.0 Celsius Garden temperature: 11.0 Celsius [E][ssl_client.cpp:36] _handle_error(): [send_ssl_data():301]: (-78) UNKNOWN ERROR CODE (004E) Disconnected from SinricPro Something went wrong...could not send Event to server!

It looks like a memory management problem, a conflict, or a stack or heap error… By the way, if I disable SSL everything runs fine!

Any idea on what to investigate?

Regards, Gabriele

sivar2311 commented 3 years ago

Hi Gabriele,

SSL needs a lot of ram (heap / dynamic memory). I think your ESP32 is running out of RAM after some time - see this line:

[E][ssl_client.cpp:36] _handle_error(): [start_ssl_client():216]: (-10368) X509 - Allocation of memory failed

Any idea on what to investigate?

You can add some statements to print the available memory periodically

  Serial.printf("Heap: %d / %d\r\n", ESP.getFreeHeap(), ESP.getHeapSize());
sgrizzi commented 3 years ago

Hi Boris, thanks for your feedback and suggestion. I guess you are right. Painful problems these memory management issues.... I added a couple of Freeheap and Heapsize tests, one in the main loop and one in the function which is issuing the sendEvent command which is probably causing the crash, and tested with SSL disabled and enabled. Total heap is +- 256K constant. Freeheap is ca. 84K without SSL and ca. 40K with SSL enabled, and it does not change significantly between main loop and within the function call, and it looks like it is not leaking memory. Difficult for me to know how much additional heap/stack ssl is using inside its code when ssl is enabled, and whether 40 K are enough, but it's clearly 50% of what is available when ssl is disabled. Would have any recommendation on how to save heap memory here? I could check the linker map but I find it difficult to immediately see where a big chunk of memory can be saved. Can the heap size allocation be changed with the partition file? Or shall I just conclude that all these devices (including the BLE library!) cannot simply fit in one ESP32 :-))
Regards, Gabriele

sivar2311 commented 3 years ago

Hi Gabriele!

I could check the linker map but I find it difficult to immediately see where a big chunk of memory can be saved. Can the heap size allocation be changed with the partition file?

Nope, you can't! This is about dynamic memory (RAM not ROM / Flash)!! Objects needs to allocate some RAM during runtime. Especially for the protcol handling (receiving requests, sending events) JSON handling needs a lot of dynamic ram. I think that's the reason why your ESP crashes when sending events. You have a lot of devices running on a single ESP. If all of your devices starts sending events at the same time -> crash.... not enough memory to handle all those data in RAM!

This issue depends highly on your your sketch and how many devices are running. I don't know how much the BLE part will consume too.

Without knowing your sketch, this is hard to analyze! Maybe you can upload your code somewhere, so i have a better overview and might guess what's causing the issue.

sgrizzi commented 3 years ago

Hi Boris, indeed, my idea was to check if anything is allocated RAM in a static way... but you are right, probably most objects are allocated dynamically. At the moment all devices are activated in close sequence, e.g. when updating their status. Maybe I could schedule updates over a longer tome frame. Thermostats and temp sensors are definitely not time-critical. BLE is for sure using a lot of flash (I had to change the partition removing almost all sspiffs space otherwise it would not fit) and I guess it is also a good ram-eater.

No problem to upload my sketch - if you can give me a mail I can send a zip.file or wetransfer or share a dropbox folder.... if you are willing to spend some time through my code... rgds, Gabriele

sivar2311 commented 3 years ago

I think the best way is to create a git repositoy for this - so we can work together on your code. You can make your repository private and invite me as collaborator (if you don't like to share your code to the public).

kakopappa commented 3 years ago

I think you could release the classic bt to free some memory if you are not using it.

esp_bt_controller_mem_release(ESP_BT_MODE_CLASSIC_BT)

sivar2311 commented 3 years ago

To be continued on a private repository...