Closed emmrichd closed 1 year ago
Which browser is used? Did you force to reload the webpage? Do you have any adblockers installed? If so, plz be sure to whitelist the ip of your AhoyDTU.
I believe that might be a known bug making a reappearance. See issues #660 and #765.
@emmrichd To rule out other reasons, maybe you could try the steps that @rmayergfx has suggested. Also starting from scratch (erase flash) will be a good idea. Please report back :)
I have looked into this a bit and I think the ESP8266 is running out of memory during concurrent requests. Here's what's happening on mine when I reload /setup:
The response for api.js looks like this:
Which looks to me like it's outputting random garbage from the RAM. Sometimes api.js will load fine, but then style.css may fail in a similar fashion and the page looks like this:
And here's the response payload for style.css:
During all of this the free heap hovers around 10-11 kB. /setup is 7.4 kB, api.js is 3kB, style.css is 2.5 kB, so overall that's 12.9 kB.
Edit: One additional quirk I found is that this problem is much more likely to happen (3 in 5 reloads) and easy to reproduce if the browser is sending a cookie together with the request. In my case I'm accessing Ahoy via an external URL that used to point to a Grafana instance, so the browser was sending the Grafana session cookie to Ahoy. If I delete the cookie, it works reasonably well. If I add a cookie to the request, it fails to load properly most of the time. So in order to reproduce the problem I would suggest using the browser's dev tools to add one or two random 50-60 byte cookies for the Ahoy URL.
i have the same problem (Safari) on Chrome it's a bit better (ESP8266)
I used an iPhone with safari. Refresh/reload did not help More tests later. I used the same phone for the old version without problems. If it is a ram issue, why does the problem persist right after a esp8266 reboot?
Same issue here on an ESP8266, newly flashed with current release, MacBook client (Chrome/Firefox).
Can you guys try the URL in a private or incognito window? Trying to check whether it's related to cookies, or whether it's happening because Apple devices may be more aggressive about making several HTTP requests at the same time.
Same issue in incognito
Can't reproduce this on ESP32, by the way
Hello, I am at home now. Windows 10 - Chrome: One time I get an "ok view", then reload - empty again. Uptime was only 13min, so it seems to restart regularly. This was not observed with the previous release. MQTT data was delivered all day, though. I would guess there is not relation to the "LED-config-bug". I did a config export, LEDs are set to 255. Of course, I could reflash it now. However, if it is running fine then, I can not provide any further bug observations. So what now?
{"wifi":{"ssid":"LB30","pwd":"","dev":"AHOY-DTU2","adm":"","prot_mask":61,"dark":false,"ip":"","mask":"","dns1":"","dns2":"","gtwy":""},"nrf":{"intvl":30,"maxRetry":5,"cs":15,"ce":2,"irq":0,"sclk":0,"mosi":0,"miso":0,"pwr":2},"ntp":{"addr":"pool.ntp.org","port":123},"sun":{"lat":xx,"lon":xx,"dis":true,"offs":900},"serial":{"intvl":5,"show":false,"debug":false},"mqtt":{"broker":"192.168.180.209","port":1883,"user":"","pwd":"","topic":"inverter","intvl":0},"led":{"0":255,"1":255},"plugin":{"disp":{"type":0,"pwrSafe":false,"pxShift":false,"rotation":0,"contrast":60,"data":255,"clock":255,"cs":255,"reset":255,"busy":255,"dc":255}},"inst":{"en":false,"rstMidNight":false,"rstNotAvail":false,"rstComStop":false,"iv":[{"en":true,"name":"HM-1500-Dach","sn":xx,"yield":[0,0,0,0],"pwr":[420,420,420,420],"chName":["1","2","3","4"]},{"en":true,"name":"HM-1500-Garage","sn":xx,"yield":[0,0,0,0],"pwr":[360,360,360,360],"chName":["1","2","3","4"]},{"en":true,"name":"HM-800-Schuppen","sn":xx,"yield":[0,0,0,0],"pwr":[400,400,0,0],"chName":["1","2","",""]}]}}
same here, but this was also in previous (dev) versions, maybe start round about 0.5.8x
click in API results only "null"
by entering the settings via
maybe a short clip shows the behaviour:
Hello,
I have flashed my esp8266 from scratch, including "wipe all data" via USB. However, the odd behaviour remains, an the connection to the inverters can not established anymore. With the last stable version, the system was working for about 4 months or so.
Dieter
@emmrichd
Can you check the pin configuration? The LED pins should be set to off, not 0. I assume the problem also occurs in private/incognito mode of the browser?
same issue here.
ESP8266 V0.6.0
access to DTU via smartphone shows same issue as access via PC. So it cannot be the cache of the browser. Opening the setup page takes quite some time.
After several refresh connection is back
for a moment the board entry at the footer shows ESP8266ESP8266ESP8266ESP8266ESP8266ESP8266
I don't know if this is related but since update to 6.0 it happens that the total is shown even when i don't have more than one inverter
Same symptoms over here with 0.6.0 on esp8266 (flashed with full wipe). Problems on several browsers and OS. I am seeing the same errors in chrome's developer tool.
I also noticed that sometimes clicking in the GUI while it is laggy can lead to a reboot of the esp8266.
EDIT: it just happened again. Reboot reason is "Software/ System restart"
Good afternoon.
I've the same problems as written above since I've updated from 0.5.66 to 0.6.0. First I thought that it is a problem with my hardware (Wemos board) but I've then changed to an esp8266 nodemcu with the same problems.
When the GUI is not working correct and I make an ping loop to the AhoyDTU boerd (esp8266 nodemcu) I see some ping losts and after I opened the WebGUI some seconds later, the esp8266 had make a automatic reboot. So something should be buggy.
I have also the problem with D1Mini and 0.6.0. Sometimes it help if I use another browser, in mos times, must reboot the Processor…
Not sure if it helps debugging... The errors mentioned above also appear in 0.5.96:
The important difference to 0.6.0 is that with 0.5.96 the DTU does not reboot.
I have looked into this a bit more. I'm writing down what I have learned so far because there doesn't seem to be an obvious quick fix, and the issue of available heap space may also be relevant in the future, so this information might continue to be useful.
I added a bit of code to output the available heap memory while a request is being processed at different stages of the request (at the beginning, before sending the response and after sending the response). On a dummy test system without an NRF connected the output looks like this: when accessing /setup:
`W: onSetup start: 13032
W: onSetup send: 12800 W: onSetup finish: 11536 W: onColor start: 11720 W: onColor send: 11528 W: onColor finish: 10936 W: onCss start: 10272 W: onCss send: 10080 W: onCss finish: 8816 W: onApiJs start: 8256 W: onApiJs send: 8040 W: onApiJs finish: 6776 W: onApi start: 13320 W: onApi send: 6912 W: onApi finish: 5752 W: onApi start: 13272 W: onApi send: 6840 W: onApi finish: 6352`
On a real system with one more inverter configured these numbers would be lower, obviously. While serving static files, the server seems to be running out of heap memory because multiple requests being processed at the same time, so the style.css or api.js requests typically fail because /setup and colors.css are still being processed. The API requests take quite a lot of heap memory as well, but they happen at a later stage, so they're not as problematic.
Possible fixes:
Some more information:
Random failures or crashes due to low heap problems seem to be pretty common with ESPAsyncWebserver if it needs to deal with several requests at once. As far as Ahoy goes, the following possible fixes seem like they'd be viable:
Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.
Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.
How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?
I added in web.h for 3 response the line response->addHeader(F("Cache-Control"), "max-age=3600"); // only 1 Hour for onFavicon, onCss and onColor. And the Web-Gui works again for my esp8266-systems.
Which browser are you using? I did some testing with Cache-Control in web.h and found that Firefox needed additional headers to actually cache the requests (Last-Modified).
I tried edge(Version 111.0.1661.62 (Offizielles Build) (64-Bit)), firefox(111.0.1 64-Bit) and Chrome(Version 111.0.5563.147 (Offizieller Build) (32-Bit)). All three work with my HM and MI-Inverters. And yes you have to enter the pages two-times until the cached Data is loaded.
How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?
That's one inverter only. Last time I checked, heap_free was at 10264. I was able to view 2-3 Pages before the DTU wasn't responding to anything at all anymore, so I had to reset it. I haven't noticed anything like it on 0.5.66. This issue definitely doesn't seem browser-cache related.
Edit: At most, caching reduces the amount of requests the webserver is receiving at a time. The underlying problem however seems to be that the webserver is struggling with to many incoming web requests.
do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.
do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.
I found that api requests aren't too critical. In my testing they start out with a free heap of 13kb or so, which dips down to about 6kb as the request is being processed. Makes sense because the code is allocating a 6kb JSON buffer.
The four simultaneous requests for the static resources are more problematic because they run in parallel:
The first one (to /setup) is still showing a free heap of 11.5kb (I added the heap header for debugging):
And this is the second request (colors.css), already down to about 2.5kb of free heap:
This is on a freshly booted ESP with the inverter not running. During the day it's worse. It has a single HM-1500 configured along with MQTT, nothing else.
do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.
At least I don't. I'm using prometheus (scraping every 30 seconds) and mqtt. Trouble only starts, when I also try to access the web ui.
I don't use the Api, but I have three inverters.
Same issue for me after updating to 0.6.0 but as log as I only use Firefox Browser on my iPhone, it works fine for days. As well with PC (EDGE). Only if I start using iPhone/iPad Safari Browser, the Ahoy reboots.
Short info, I have installed 0.6.4 Beta and this looks very good. I can't reproduce the error anymore. I have try it with Safari and Firefox on my Mac and Safari on iPhone... If the Beta works fine with Communication to the HM, I install it also on my productive system
Similar results here in 0.6.4, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).
Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).
I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.
What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?
Hello, thank you for looking after this. I posted my config already somewhere above. Here the summary:
Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).
I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.
What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?
Hi @tastendruecker123: System is a D1 Mini Pro ESP8266 connected with only one Inverter (HM-700). No Display and not prometheus configured. Only mqtt and the sunrise option is used.
Interesting. I assume both of you are using 0.6.3 or 0.6.4?
After updating to 0.6.4 it works finde fine for me on IOS
I was still on 0.6.0, Trying to update right now.
Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.
I am running 0.6.4 for almost 2 days and I still have occasional unnoticed reboots. Sometimes I can trigger a reboot by cycling through the top menu item, Live and System. At some point the menu tree is incomplete (only visible items are AhoyDTU, Rest API, Documentation and About) and a few seconds later the device reboots (reason Software/System restart). Heap frag is low (3) and does not increase prior to a reboot. This happens with a single inverter (HM800) and mqtt, ntp and sunrise/sunset active. Chrome/Mac but happens with Chrome/Android, too.
Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.
my described behaviour was with 0.6.4, so if I use the REST query in parallel of MQTT, ahoy crashes.
Issue back with 0.6.9? I had and have no problems with 0.6.7 on my 8266 but as soon as I update to the release version the UI becomes unstable again.
Same Issue here with 0.6.9 running on 8266 after enabling mqtt
for me it helped to reboot Ahoy after OTA upgrade. Check the heap after reboot in the system page. It should be around or below 10%
Hardware
Modelname: __ Retailer URL: __
nRF24L01+ Module
Antenna:
Power Stabilization:
After Ota update from 0.5.6, the gui is amost empty. Mqtt seems to run. Reboot does not help. Should I start from scratch?