Closed wardog-82 closed 4 months ago
i think this can be a general problem with the new version, check out https://github.com/helgeerbe/OpenDTU-OnBattery/issues/591
i think this can be a general problem with the new version, check out #591
No, this is a general issue of all openDTU versions under low memory conditions.
@wardog-82 Can you show us under system->info the heap conditions? If it is a low memory szenario and you are using TLS on mqtt, it might help to turn tls off.
Thank you very much for the quick response. The requested screenshots. MQTT and TLS is deactivated.
No, that doesn't look like a low memory situation. I will push some changes of the PowerMeter code into development. Would be good, if you try this code. I hope, that I have time today. I will link the code to precompiled bin files here.
It is interesting that after 7 seconds of uptime your larges free heap block is only 92kB and the fragmentation level is 35% already.
Why does your ESP32 have so much less total heap than mine? As per datasheet yours should have 8KB more SRAM total (ESP32-D0WD-V3 versus ESP32-S3).
And I did not know that the binary size is so much different between the models and that generic_esp32
is so close to the limit...
Message remains unchanged even after the "Timeout" has been exceeded.
Can you test this again with the device attached to a host computer and paste the serial output? Helge thinks this is a low-memory issue. I would like to see respective messages in the serial output.
@schlimmchen That is close to mine D1 mini. As a side note: When I enable tls for mqtt I have over 80% fragmentation and after a while, I see reboots.
Thank you for your feedback. The output of the serial interface
Using 'COM8' as serial port.
Showing logs:
[15:53:51]eE (763) esp_core_dump_flash: No core dump[15:53:51]E (763) esp_core_dump_flash: No core dump partition found!
[15:53:51]
[15:53:51]Starting OpenDTU
[15:53:51]Initialize FS... done
[15:53:51]Reading configuration... done
[15:53:51][ 179][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
[15:53:51]Reading PinMapping... using default config done
[15:53:51]Initialize Network... done
[15:53:51]Setting Hostname... Configuring WiFi STA using new credentials... done
[15:53:51]Initialize NTP... done
[15:53:51]Initialize SunPosition... done
[15:53:51]Initialize MqTT... done
[15:53:51]Initialize WebApi... done
[15:53:51]Initialize Display... done
[15:53:51]Initialize LEDs... done
[15:53:51]Check for default DTU serial... done
[15:53:51]done
[15:53:51]Initialize Hoymiles interface... NRF: Connection error!!
[15:53:51] Setting radio PA level...
[15:53:51] Setting DTU serial...
[15:53:51] Setting poll interval...
[15:53:51] Setting verbosity...
[15:53:51]done
[15:53:51][VictronMppt] rx = -1, tx = -1
[15:53:51][VictronMppt] invalid pin config
[15:53:51]Initialize Huawei AC charger interface...
[15:53:51]Invalid pin config
[15:53:51]Switch to WiFi mode
[15:53:51]Setting Hostname... done
[15:53:51]Configuring WiFi STA using new credentials... done
[15:53:51]Configuring WiFi STA DHCP IP... done
[15:53:51][ 0.477] DPL: waiting for valid date and time to be available
[15:53:54]WiFi connected
[15:53:54]WiFi got ip: 192.168.1.20
[15:53:54]Network connected
[15:53:55][ 4.163] DPL: disabled by configuration
[15:54:01]Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception).
[15:54:01]Debug exception reason: Stack canary watchpoint triggered (loopTask)
[15:54:01]Core 1 register dump:
[15:54:01]PC : 0x40092e5f PS : 0x00060036 A0 : 0x8009171c A1 : 0x3ffb0310
[15:54:01]A2 : 0x3ffbf5c8 A3 : 0xb33fffff A4 : 0x0000cdcd A5 : 0x00060023
[15:54:01]A6 : 0x00060023 A7 : 0x0000abab A8 : 0xb33fffff A9 : 0xffffffff
[15:54:01]A10 : 0x00000004 A11 : 0x00000004 A12 : 0x80084f92 A13 : 0x3ffbf51c
[15:54:01]A14 : 0x007bf5c8 A15 : 0x003fffff SAR : 0x0000001e EXCCAUSE: 0x00000001
[15:54:01]EXCVADDR: 0x00000000 LBEG : 0x4018a79c LEND : 0x4018a7c4 LCOUNT : 0x00000000
[15:54:01]
[15:54:01]
[15:54:01]Backtrace: 0x40092e5c:0x3ffb0310 0x40091719:0x3ffb0350 0x4008f9f0:0x3ffb0380 0x4008f9a0:0xa5a5a5a5 |<-CORRUPTED
[15:54:01]
[15:54:01]
[15:54:01]
[15:54:01]
[15:54:01]ELF file SHA256: 732c8b63c1fcdf8e
[15:54:01]
[15:54:01]E (10187) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
[15:54:01]Rebooting...
[15:54:01]ets Jul 29 2019 12:21:46
[15:54:01]
[15:54:01]rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[15:54:01]configsip: 0, SPIWP:0xee
[15:54:01]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[15:54:01]mode:DIO, clock div:2
[15:54:01]load:0x3fff0030,len:1184
[15:54:01]load:0x40078000,len:13232
[15:54:01]load:0x40080400,len:3028
[15:54:01]entry 0x400805e4
[15:54:02]E (762) esp_core_dump_flash: No core dump[15:54:02]E (762) esp_core_dump_flash: No core dump partition found!
[15:54:02]
[15:54:02]Starting OpenDTU
[15:54:02]Initialize FS... done
[15:54:02]Reading configuration... done
[15:54:02][ 180][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
[15:54:02]Reading PinMapping... using default config done
[15:54:02]Initialize Network... done
[15:54:02]Setting Hostname... Configuring WiFi STA using new credentials... done
[15:54:02]Initialize NTP... done
[15:54:02]Initialize SunPosition... done
[15:54:02]Initialize MqTT... done
[15:54:02]Initialize WebApi... done
[15:54:02]Initialize Display... done
[15:54:02]Initialize LEDs... done
[15:54:02]Check for default DTU serial... done
[15:54:02]done
[15:54:02]Initialize Hoymiles interface... NRF: Connection error!!
[15:54:02] Setting radio PA level...
[15:54:02] Setting DTU serial...
[15:54:02] Setting poll interval...
[15:54:02] Setting verbosity...
[15:54:02]done
[15:54:02][VictronMppt] rx = -1, tx = -1
[15:54:02][VictronMppt] invalid pin config
[15:54:02]Initialize Huawei AC charger interface...
[15:54:02]Invalid pin config
[15:54:02]Switch to WiFi mode
[15:54:02]Setting Hostname... done
[15:54:02]Configuring WiFi STA using new credentials... done
[15:54:02]Configuring WiFi STA DHCP IP... done
[15:54:02][ 0.463] DPL: disabled by configuration
[15:54:05]WiFi disconnected
[15:54:05]Try reconnecting
Debug exception reason: Stack canary watchpoint triggered (loopTask)
Well, it's not a heap problem, then. @helgeerbe The stack size is too small. Maybe the stack finally has had enough after TaskScheduler is part of the mix. There may be another reason...
I had a look at this stack allocation before and investigated. I found that the stack was indeed large enough to handle 2.25k of chars in that context. And it was large enough by quite a margin I remember. Maybe the margin wasn't that large and introducing the TaskScheduler made HttpPowerMeter finally hit the limit.
char response[2000]
is btw only allocated to store a http-response string in it, to then pass it to a function that converts the char array into a JSON to then parse a float. That is a lot of allocation just for a float. In https://github.com/helgeerbe/OpenDTU-OnBattery/pull/594 I tried to move this all into tryGetFloatValueForPhase
, and make this a bit more direct (basically avoiding the char array altogether, and keeping all those allocations locally to the function so it is free'd right away)
httpResponse = httpClient.getString(); //very unfortunate that we cannot parse WifiClient stream directly
StaticJsonDocument<2048> json; //however creating these allocations on stack should be fine to avoid heap fragmentation
deserializeJson(json, httpResponse);
As a further improvement to minimize all these allocations, it should actually be possible to re-implement getString()
to directly convert the WIFIbuffer into the JSON without taking a detour via a string. Let me know if you think this is effort well spent. I currently think it is not.
@schlimmchen: you appear to understand the task thing. Are WIFIClient and HTTPPowerMeter running on the same task? I somewhere read that WIFI is running on its own task on core 0, while the rest of OpenDTU runs on core 1. Or the other way around? Confused. Could Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception).
also be a tasking issue / race condition?
Guru Meditation Error: Core 1 panic'ed
can be a lot, however in this case it gave a clear reason (as is the case most of the time), which is the stack corruption/overflow.
Actually I cannot tell you how WIFIClient callbacks are handled. The low-level WiFi driver stuff happens on a particular core, that's what I read.
Please have a look at the signature of httpClient.getString(). If that actually copies the whole response, that would be an issue. It could also be that it mere creates a (most certainly const) reference to something, so no copying takes place. In general, of course, it is indeed desirable to avoid copying large chunks of data. The deserializeJson function might as well read directly from whatever char buffer the httpClient already has to have stored the response.
Looks to me like getString() copies. And it needs to, as the response can come in chunks. In fact the lib FirebaseJSON (which I eliminated in the pull request ) can parse a JSON directly out of the Wificlient buffer, and it can deal with chunks, but it only works when then connection is still open. Shelly EM forces a “Connection close” after each response, so my attempt to use this function lead to a crashed/frozen OpenDTU (which was stuck in the loop trying to read the WiFi buffer)
@wardog-82 What is the status of this? Are you still having the same issue?
Hello, yes, unfortunately the problem still exists.
@wardog-82 If possible, please test the Firmware provided in #1077. The HTTP+JSON power meter provider now polls data in context of its own thread. If the stack size is still not sufficient for your scenario, we can easily tweak it an see if we can finally get rid of this problem.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.
What happened?
With the fimware version of the OpenDTU onBattery fd58ad2 I can read the current power consumption from the Powerfox / Poweropti with the electricity meter settings. This has been running stably for months. See screenshot for settings If I use the latest firmware version 2024.01.07 and apply the settings, the OpenDTU unit hangs after a short runtime (a few minutes/hours (website no longer accessible).
To Reproduce Bug
With the new version (2024.01.07) you can store the authorization type "Basic". Even with this setting, the OpenDTU unit hangs. When testing the connection via the "Test" button, the message "HTTP request is being sent..." appears. Message remains unchanged even after the "Timeout" has been exceeded. I have also changed the timeout value without success. If the "Activate current meter" option is set to inactive, the OpenDTU unit runs stably. ESP32 is freshly flashed, but unfortunately did not help.
Expected Behavior
With the new firmware version (2024.01.07), the current power consumption is displayed in the dashboard without the DTU unit hanging up. Please help
Install Method
Pre-Compiled binary from GitHub
What git-hash/version of OpenDTU?
2024.01.07
Relevant log/trace output
Anything else?
No response