lumapu / ahoy

Various tools, examples, and documentation for communicating with Hoymiles microinverters
https://ahoydtu.de
Other
953 stars 224 forks source link

Webgui almost empty after update, instable #828

Closed emmrichd closed 1 year ago

emmrichd commented 1 year ago

Hardware

Modelname: __ Retailer URL: __

nRF24L01+ Module

Antenna:

Power Stabilization:

After Ota update from 0.5.6, the gui is amost empty. Mqtt seems to run. Reboot does not help. Should I start from scratch?

rmayergfx commented 1 year ago

Which browser is used? Did you force to reload the webpage? Do you have any adblockers installed? If so, plz be sure to whitelist the ip of your AhoyDTU.

Argafal commented 1 year ago

I believe that might be a known bug making a reappearance. See issues #660 and #765.

Argafal commented 1 year ago

@emmrichd To rule out other reasons, maybe you could try the steps that @rmayergfx has suggested. Also starting from scratch (erase flash) will be a good idea. Please report back :)

tastendruecker123 commented 1 year ago

I have looked into this a bit and I think the ESP8266 is running out of memory during concurrent requests. Here's what's happening on mine when I reload /setup:

image

The response for api.js looks like this:

image

Which looks to me like it's outputting random garbage from the RAM. Sometimes api.js will load fine, but then style.css may fail in a similar fashion and the page looks like this:

image

And here's the response payload for style.css:

image

During all of this the free heap hovers around 10-11 kB. /setup is 7.4 kB, api.js is 3kB, style.css is 2.5 kB, so overall that's 12.9 kB.

Edit: One additional quirk I found is that this problem is much more likely to happen (3 in 5 reloads) and easy to reproduce if the browser is sending a cookie together with the request. In my case I'm accessing Ahoy via an external URL that used to point to a Grafana instance, so the browser was sending the Grafana session cookie to Ahoy. If I delete the cookie, it works reasonably well. If I add a cookie to the request, it fails to load properly most of the time. So in order to reproduce the problem I would suggest using the browser's dev tools to add one or two random 50-60 byte cookies for the Ahoy URL.

lukask005 commented 1 year ago

i have the same problem (Safari) on Chrome it's a bit better (ESP8266)

emmrichd commented 1 year ago

I used an iPhone with safari. Refresh/reload did not help More tests later. I used the same phone for the old version without problems. If it is a ram issue, why does the problem persist right after a esp8266 reboot?

pschlan commented 1 year ago

Same issue here on an ESP8266, newly flashed with current release, MacBook client (Chrome/Firefox).

tastendruecker123 commented 1 year ago

Can you guys try the URL in a private or incognito window? Trying to check whether it's related to cookies, or whether it's happening because Apple devices may be more aggressive about making several HTTP requests at the same time.

pschlan commented 1 year ago

Same issue in incognito

grafik grafik
pschlan commented 1 year ago

Can't reproduce this on ESP32, by the way

emmrichd commented 1 year ago

Hello, I am at home now. Windows 10 - Chrome: One time I get an "ok view", then reload - empty again. Uptime was only 13min, so it seems to restart regularly. This was not observed with the previous release. MQTT data was delivered all day, though. I would guess there is not relation to the "LED-config-bug". I did a config export, LEDs are set to 255. Of course, I could reflash it now. However, if it is running fine then, I can not provide any further bug observations. So what now?

{"wifi":{"ssid":"LB30","pwd":"","dev":"AHOY-DTU2","adm":"","prot_mask":61,"dark":false,"ip":"","mask":"","dns1":"","dns2":"","gtwy":""},"nrf":{"intvl":30,"maxRetry":5,"cs":15,"ce":2,"irq":0,"sclk":0,"mosi":0,"miso":0,"pwr":2},"ntp":{"addr":"pool.ntp.org","port":123},"sun":{"lat":xx,"lon":xx,"dis":true,"offs":900},"serial":{"intvl":5,"show":false,"debug":false},"mqtt":{"broker":"192.168.180.209","port":1883,"user":"","pwd":"","topic":"inverter","intvl":0},"led":{"0":255,"1":255},"plugin":{"disp":{"type":0,"pwrSafe":false,"pxShift":false,"rotation":0,"contrast":60,"data":255,"clock":255,"cs":255,"reset":255,"busy":255,"dc":255}},"inst":{"en":false,"rstMidNight":false,"rstNotAvail":false,"rstComStop":false,"iv":[{"en":true,"name":"HM-1500-Dach","sn":xx,"yield":[0,0,0,0],"pwr":[420,420,420,420],"chName":["1","2","3","4"]},{"en":true,"name":"HM-1500-Garage","sn":xx,"yield":[0,0,0,0],"pwr":[360,360,360,360],"chName":["1","2","3","4"]},{"en":true,"name":"HM-800-Schuppen","sn":xx,"yield":[0,0,0,0],"pwr":[400,400,0,0],"chName":["1","2","",""]}]}}

fila612 commented 1 year ago

same here, but this was also in previous (dev) versions, maybe start round about 0.5.8x Bildschirm­foto 2023-03-30 um 07 47 09

click in API results only "null"

by entering the settings via /setup, all the areas are empty, now WLAN, no inverter, no mqtt are shown, but the ahoy is receiving data from inverter and sending them also via mqtt. seems that ist "only" a visual thing...

maybe a short clip shows the behaviour:

https://user-images.githubusercontent.com/29542374/228759807-73ad9162-bf0d-4d2a-96cc-d7ee59073134.mov

emmrichd commented 1 year ago

Hello,

I have flashed my esp8266 from scratch, including "wipe all data" via USB. However, the odd behaviour remains, an the connection to the inverters can not established anymore. With the last stable version, the system was working for about 4 months or so.

Dieter

tastendruecker123 commented 1 year ago

@emmrichd

Can you check the pin configuration? The LED pins should be set to off, not 0. I assume the problem also occurs in private/incognito mode of the browser?

Mogdar-M commented 1 year ago

same issue here.

ESP8266 V0.6.0

access to DTU via smartphone shows same issue as access via PC. So it cannot be the cache of the browser. Opening the setup page takes quite some time.

After several refresh connection is back

for a moment the board entry at the footer shows ESP8266ESP8266ESP8266ESP8266ESP8266ESP8266

Mogdar-M commented 1 year ago

image I don't know if this is related but since update to 6.0 it happens that the total is shown even when i don't have more than one inverter

sumerland commented 1 year ago

Same symptoms over here with 0.6.0 on esp8266 (flashed with full wipe). Problems on several browsers and OS. I am seeing the same errors in chrome's developer tool.

I also noticed that sometimes clicking in the GUI while it is laggy can lead to a reboot of the esp8266.

EDIT: it just happened again. Reboot reason is "Software/ System restart"

christian-karsie commented 1 year ago

Good afternoon.

I've the same problems as written above since I've updated from 0.5.66 to 0.6.0. First I thought that it is a problem with my hardware (Wemos board) but I've then changed to an esp8266 nodemcu with the same problems.

When the GUI is not working correct and I make an ping loop to the AhoyDTU boerd (esp8266 nodemcu) I see some ping losts and after I opened the WebGUI some seconds later, the esp8266 had make a automatic reboot. So something should be buggy.

cyrax303 commented 1 year ago

I have also the problem with D1Mini and 0.6.0. Sometimes it help if I use another browser, in mos times, must reboot the Processor…

sumerland commented 1 year ago

Not sure if it helps debugging... The errors mentioned above also appear in 0.5.96:

image

The important difference to 0.6.0 is that with 0.5.96 the DTU does not reboot.

tastendruecker123 commented 1 year ago

I have looked into this a bit more. I'm writing down what I have learned so far because there doesn't seem to be an obvious quick fix, and the issue of available heap space may also be relevant in the future, so this information might continue to be useful.

I added a bit of code to output the available heap memory while a request is being processed at different stages of the request (at the beginning, before sending the response and after sending the response). On a dummy test system without an NRF connected the output looks like this: when accessing /setup:

`W: onSetup start: 13032

W: onSetup send: 12800 W: onSetup finish: 11536 W: onColor start: 11720 W: onColor send: 11528 W: onColor finish: 10936 W: onCss start: 10272 W: onCss send: 10080 W: onCss finish: 8816 W: onApiJs start: 8256 W: onApiJs send: 8040 W: onApiJs finish: 6776 W: onApi start: 13320 W: onApi send: 6912 W: onApi finish: 5752 W: onApi start: 13272 W: onApi send: 6840 W: onApi finish: 6352`

On a real system with one more inverter configured these numbers would be lower, obviously. While serving static files, the server seems to be running out of heap memory because multiple requests being processed at the same time, so the style.css or api.js requests typically fail because /setup and colors.css are still being processed. The API requests take quite a lot of heap memory as well, but they happen at a later stage, so they're not as problematic.

Possible fixes:

tastendruecker123 commented 1 year ago

Some more information:

Random failures or crashes due to low heap problems seem to be pretty common with ESPAsyncWebserver if it needs to deal with several requests at once. As far as Ahoy goes, the following possible fixes seem like they'd be viable:

ziermmar commented 1 year ago

Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.

tastendruecker123 commented 1 year ago

Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.

How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?

AsZork commented 1 year ago

I added in web.h for 3 response the line response->addHeader(F("Cache-Control"), "max-age=3600"); // only 1 Hour for onFavicon, onCss and onColor. And the Web-Gui works again for my esp8266-systems.

tastendruecker123 commented 1 year ago

Which browser are you using? I did some testing with Cache-Control in web.h and found that Firefox needed additional headers to actually cache the requests (Last-Modified).

AsZork commented 1 year ago

I tried edge(Version 111.0.1661.62 (Offizielles Build) (64-Bit)), firefox(111.0.1 64-Bit) and Chrome(Version 111.0.5563.147 (Offizieller Build) (32-Bit)). All three work with my HM and MI-Inverters. And yes you have to enter the pages two-times until the cached Data is loaded.

ziermmar commented 1 year ago

How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?

That's one inverter only. Last time I checked, heap_free was at 10264. I was able to view 2-3 Pages before the DTU wasn't responding to anything at all anymore, so I had to reset it. I haven't noticed anything like it on 0.5.66. This issue definitely doesn't seem browser-cache related.

Edit: At most, caching reduces the amount of requests the webserver is receiving at a time. The underlying problem however seems to be that the webserver is struggling with to many incoming web requests.

lumapu commented 1 year ago

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

tastendruecker123 commented 1 year ago

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

I found that api requests aren't too critical. In my testing they start out with a free heap of 13kb or so, which dips down to about 6kb as the request is being processed. Makes sense because the code is allocating a 6kb JSON buffer.

The four simultaneous requests for the static resources are more problematic because they run in parallel:

image

The first one (to /setup) is still showing a free heap of 11.5kb (I added the heap header for debugging):

image

And this is the second request (colors.css), already down to about 2.5kb of free heap:

image

This is on a freshly booted ESP with the inverter not running. During the day it's worse. It has a single HM-1500 configured along with MQTT, nothing else.

ziermmar commented 1 year ago

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

At least I don't. I'm using prometheus (scraping every 30 seconds) and mqtt. Trouble only starts, when I also try to access the web ui.

emmrichd commented 1 year ago

I don't use the Api, but I have three inverters.

gitty-jsu commented 1 year ago

Same issue for me after updating to 0.6.0 but as log as I only use Firefox Browser on my iPhone, it works fine for days. As well with PC (EDGE). Only if I start using iPhone/iPad Safari Browser, the Ahoy reboots.

cyrax303 commented 1 year ago

Short info, I have installed 0.6.4 Beta and this looks very good. I can't reproduce the error anymore. I have try it with Safari and Firefox on my Mac and Safari on iPhone... If the Beta works fine with Communication to the HM, I install it also on my productive system

fila612 commented 1 year ago

Similar results here in 0.6.4, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).

tastendruecker123 commented 1 year ago

Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).

I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.

What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?

emmrichd commented 1 year ago

Hello, thank you for looking after this. I posted my config already somewhere above. Here the summary:

fila612 commented 1 year ago

Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).

I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.

What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?

Hi @tastendruecker123: System is a D1 Mini Pro ESP8266 connected with only one Inverter (HM-700). No Display and not prometheus configured. Only mqtt and the sunrise option is used.

tastendruecker123 commented 1 year ago

Interesting. I assume both of you are using 0.6.3 or 0.6.4?

gitty-jsu commented 1 year ago

After updating to 0.6.4 it works finde fine for me on IOS

emmrichd commented 1 year ago

I was still on 0.6.0, Trying to update right now.

tastendruecker123 commented 1 year ago

Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.

sumerland commented 1 year ago

I am running 0.6.4 for almost 2 days and I still have occasional unnoticed reboots. Sometimes I can trigger a reboot by cycling through the top menu item, Live and System. At some point the menu tree is incomplete (only visible items are AhoyDTU, Rest API, Documentation and About) and a few seconds later the device reboots (reason Software/System restart). Heap frag is low (3) and does not increase prior to a reboot. This happens with a single inverter (HM800) and mqtt, ntp and sunrise/sunset active. Chrome/Mac but happens with Chrome/Android, too.

fila612 commented 1 year ago

Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.

my described behaviour was with 0.6.4, so if I use the REST query in parallel of MQTT, ahoy crashes.

mr-p666 commented 1 year ago

Issue back with 0.6.9? I had and have no problems with 0.6.7 on my 8266 but as soon as I update to the release version the UI becomes unstable again.

benbecke commented 1 year ago

Same Issue here with 0.6.9 running on 8266 after enabling mqtt

lumapu commented 1 year ago

for me it helped to reboot Ahoy after OTA upgrade. Check the heap after reboot in the system page. It should be around or below 10%