Closed patience4711 closed 1 year ago
Hi, I have installed this experimental version. I'm using 5 DS3L inverters with each 2 panels connected.
The version works fine upto now, I just noticed some strange behaviour in the build-up of the overview:
After initial access, only inverter 0 and inverter 1 are shown immediately:
After a couple of appprox 15 seconds, the table is completed:
Thanks for your effort in developing this solution!
R.
Eeg.
My second screen-shot was from yesterday evening (inverters not yet paired), this one is from today:
@Superbert25 Thanks for your feedback, i solved that issue already. I think it looks good. Could you tell me how it looks on a smartphone?
Sure! Screenshot from iPhone 13:
It looks fine to me :-)
R. Eeg
I am now using the second 9.6 experimental version. Couple of times a day the Zigbee Coordinator is restarted, sometimes the system as a whole is rebooted. I noticed that it seems related to loosing communication with one or more inverters.
I will build a new setup in a couple of weeks time with a longe range zigbee module (with PA and external antenna, that might improve things. I can share my findings here if you like.
After a year of problemless working now problems arrise, is that because of the newest features or was there no user having more than 3 inverters all this time. So, what i am trying to find out is:
Is version 9_6a working without problems ? (this is the original 9_6)
Is version 9_6b is working stable ? (this has the new mqtt topic feature)
is version 9_7_beta working stable ? (this has the new frontpage and the mqtt topic feature)
So if you have been using version 9_6 without problems than one of the two new versions cause the troubles. Can we relate the problems to the number of inverters? Does the problem go away when you lower the number of inverters?
@Superbert25 you have 5 inverters, what i would like to know is are you new to this project or have you been using 9_6 (without problems) in the past.
I’m new, started with 9.6, now on 9.6b (again).
9.7 beta crashed all the time (within 30 seconds). I flashed over OTA, could that have something to do with it? So, I reverted back to 9.6b over OTA, which is stable enough (few crashes a day). I can dive into the issues after next week.
R. Eeg.
my report about the 9_7_beta: I have 2 YC600 inverters, approx. 35 % resp. 50% Zigbee signal strength. the ESP-ECU ran without problems for 25h, no crash, resetCounter = 1. CC2530 zigbee module. Frontpage looks ok on my phone. I never saw more than 2 decimals. works fine ! thanks.
@patience4711 , thank you for your response! Your answer is clear to me, the polled data can be a few minutes old and therefor not really accurate.
About the crashes: the software is stable during night time. During polling time the esp crashes a lot. I’ll try to power the cc2530/2591 direct from a power source. The behavior looks like a power shortage. The 2530 is polling and the esp is sending data over WiFi. That consumes a lot of energy.
I’ll build the new power source tomorrow and report back to you.
@swbouman You still didn't answer my question about the frontpage of 9_6b. Since you where mentioning the totals (not present in 6b) I thought maybe you have the wrong version. If the problem is related to the number of inverters you should be able to find a stable situation by lowering the number of inverters. If your powersupply is 3A there shouldn't be problems but if not there could be brownouts.
@Superbert25 What zigbee do you use, by chance the 2530+2591?
I am using a cheap CC2530 without amplifier. I ordered a new one with 2592 and external antenna connection.
ESP-ECU v. 9_7_beta up for 3 days. resetCounter = 8. D1-mini pro board + cc2530 running from a 5V / 1A power supply.
@frtz13 thanks, so the esp stays stable meaning that the new features are not involved. We can see here however that the zigbee crashes some times. This is an old problem that occured with a high polling frequency (more than once in 5 minutes). In the past the polling frequency was configurable but since it didn't work stable, the 5 minutes (same as the official ecu) became standard. With only one inverter this worked rock-stable. Multiple inverters however means virtually a higher polling rate and that causes the crashes of the zigbee. The reason is supposedly hidden in the zigbee's software (a buffer overrun?) but the designer is convinced that this is not the case. We know that when this happens, very large serial messages are sent. This also could cause the ESP to crash because some processes take too long. Maybe the wdt overflows because of that. I really don't know. My Raspberry Zero implementation check does not have this problem, someone uses it with 9 inverters without problems.
v9_7a is a version that has all of the new features and some adaptions as to feeding the wdt and cleaning the serial buffer. I hope it helps.
Hello all,
After a weekend of testing with a 10 A power supply, just to be sure 😜, and a lot of crashes when switching from the console view to the home view, I installed the new 9.7 version today. This latest version gives me a solid online system for 2 hours now. I got some error 50 status codes, but after a minute or 10 the system restarts the zigbee module. Mem is around 16000 and stable.
To avoid the error 50 code, a little delay between polling different inverters could be the(or a) solution.
Oke so now the situation is that the esp is stable but the zigbee still crashes. I am thinking about a method that divides 5 minutes by the number of inverters and uses this result as a pause between pollings of each inverter. You may be right that this could help but i remember that every higher polling rate than 5 minutes leaded to crashes of the zigbee. Nevertheless i'l give it a shot. I'l come back when i have this implemented.
@patience4711 , you’re right about the stablility. Now solid for 3 hours.
About the zigbee stability, could you pls start with a delay of 2 seconds between polling each inverter, everytime auto polling is started? So the polling is started after 300 secs, with a timeout of 2 secs after an inverter poll is started? In my situation the polling of all inverters will take 14 secs. I think this is a minor code change for your program.
Oke here it is. With a delay of 2 seconds between each poll. Can you explain why you think that this would make a difference? I hope you are right but wit the story of the high poll rate in mind...
It is a shot in the wild @patience4711. I took the higher polling rate in mind. I’m the console I saw the polled inverters coming in quickly after each other. In mine opinion a new big call can be written to the cc2530 wile it is still processing the previous. So that’s why I think this could be a solution.
An other thing I think about is the disabled serial debugging from the esp when zigbee is enabled. I don’t know if the serial debugging from the esp is disabled at that moment or if it is forwarded to the cc2530. That could cause a failure in the cc2530 due to not handled serial commands… that would make a big investigation.
I’ll test the new firmware tomorrow. Thanks for the big support to this project!
Hello,
I tested the new 9.7 firmware today. Uptime is for about 4 hours, where after the esp reboots. I’ll investigate if something has todo with the WiFi, but I doubt that because all other esp’s are running stable for months.
Ive not experienced any error 50s anymore. So the break in polling between inverters is an outcome.
Ive another strange behavior. Two inverters (far away from the esp) have polling issues with code 11 and 15. At the next polling they give data and a signal strength about 30%
Does anyone has this issue aswell? When auto polling is off, and inverters are polled by sending the poll command over mqtt polling is instant for these two inverters.
—-
That ESP crash after some hours,.. weird. I am still wondering if this has something to do with the number of inverters. The other issues seem to be a range problem to me. At this point i wouldn't know what else to do. If the other users share their experiences with this 9_7xb version here we might find a systematical reason for the crashes.
installed the 9_7xb version. I'll report back tomorrow.
version 9_7xb up for 11h now, no problems. resetCounter = 0. on thing I would like to mention: I never stay connected to the ESP-ECU web interface with the browser. I just have a look and close the window afterwards. Especially, the INFO page makes http requests continuously, retrieving the current time.
@swbouman as far as i can see only inverters 0, 3, 6 show a correct polling answer. This is weird because the program checks some properties of the answer and rejects it if it not matches the criteria. Did you read here that there seems to be a batch of ds3 inverters that have issues?
@patience4711 thank you for looking in the data. As far as I know, inv3 has one panel attached, so I’ll disable the second one in the near future.
Today the installers are back because of the wrong readings. They discovered that one inverter is probably dead because of bad installation and the others weren’t attached to 220v grid at all.
This was in the morning and they went shopping for tools since then.
Hope they will be back today to finish the job. Then all inverters will produce power.
The company owner was here to apologize for the first group of installers, which worked here.
@swbouman this is a scandal, not much can go wrong with the installation, bunglers! Anyway now there is hope that the ECU can work properly.
@frtz13 Oke thanks thats good news. v9_7xb stable with 2 inverters (yc600).
You suspect the datatrafic could have something to do with it? I think we would have discovered that sooner but you can test this. By opening the frontpage on several devices at the same time. Would you be so kind to report again after some days? The uptime and resetcounter are the interesting data.
@patience4711 I connected to the ESP_ECU with 3 browser windows. navigated with one of these to the INFO page. the ESP-ECU crashed rapidly, as soon as I started to navigate. while the front page is relatively sparse with HTTP requests (about 1 for each inverter every 10 seconds approx.), the INFO page makes approx. 10 requests per second to get the current time.
@frtz13 Thanks for testing, if i open 4 browserpages (3 on chrome and one on my phone), all to the infopage, no crashes occure. But i don't have traffic on the frontpage so it is not completly comparable. The responsiveness decreases indeed. The refreshrate of the time is about 2 seconds, i can change in such way that the script only runs once at pageload. It is only a check that system time is correct so not really important. v9_7xc
in the 9_7xb version, INFO page, the HTTP request rate for the current time is much greater than every 2 seconds. it is about 10 per second, maybe more. checked with both Firefox and Edge (Chromium). ok, in the 9_7xc version, these request are gone. BTW, would be nice to have a /get.rssi request to get the current wifi signal level.
The wifi signal strength is displayed in the infopage. If you refresh the page you'l get the current level. ( WiFi.RSSI()) )
... I know. the idea is to request it once a minute or so, and display it in my home automation system.
... or even better: some sort of JSON "status" record, containing uptime, rssi, resetCounter + ? every x minutes via MQTT, or accessible via httprequest.
Sorry too less programspace left for "nice to have" additions. In the childhood of this project i had this to monitor the free memory to see if the crashes could be explained with it. Lets concentrate on getting the system stable insteat.
9_7xc version up for 18h. no crash, resetCounter = 0.
9_7xc version up for 48h. no crash. resetCounter = 0, which is significantly less than previous versions. well done, @patience4711 .
9_7xc version: got a reboot this morning, when connecting to the web interface. It looked like the ESP choked when fetching the data for the second inverter. I also got a Zigbee reset this afternoon.
@frtz13 You can see the difficulty here, these incidents don't happen systematically so it's impossible to find out what the cause is. If it is the websever that crashes than it must be the new frontpage causing that. Because to my knowledge this doesn't happen with the former versions. Would be interesting to know how if the 9_6 version works stable in your situation. But if it is the frontpage why not everytime you visit it?
As for the zigbee, the cc2530 is seems pretty outdated and its memory is vulnerable for corruption, Zigbee2Mqtt doesn't recomment it for this reason. There is however no alternative that has the right firmware and can be connected to the ESP. After a lot of experiments i couldn't find a possible reason (could be firmware related) so i made the system invulnerable for these crashes. When it happens, every 10 minutes there is a healthcheck that recoveres the zigbee. In worst case you could miss 2 polls. We'll have to live with it I'm afraid.
But if it is the frontpage why not everytime you visit it?
Maybe ist just concurrence of events. display the frontpage while zigbee traffic going on... I hardly ever display the ESP-ECU web pages. Mostly to check the wifi rssi from time to time (the ESP-ECU is quite far away in the garden). And I did more often recently, to give some feedback about the new version. the 9_6 version had been running for months without problems. just a Zigbee reset from time to time.
@frtz13 So it must be the new frontpage then. Requesting data from all inverters at the same time may be too much for the server. Could even be that the mosquitto network activities cause it. I don't know, the system has an async webserver that runs its business in the background. It is very hard to debug.
So for you the new frontpage is not very important an the old version is good enough?
I was wondering: Sometimes it seems that something is stacking up in the zigbeemodule, that would explain why the more polls it does, the sooner it crashes. If we would force a reset of the ZBmodude every midnight, than its memory is cleared and maybe it wouldn't crash during the day. So i compiled v9_7xd with this idea implemented.
@patience4711 I installed v9_7xd. the update went ok, but the INFOPAGE still says v9_7xc. I'll watch and see if I get a Zigbee reset toningt.
@frtz13 Yes i forgot that, sorry. Could you download and install it again v9_7xd , i made another minor change to the significance of the resetcounter, These forced resets do not increase the resetcounter, only the incidental resets are counted. So you could see in the log that the reset took place but the resetcounter is still 0 (hopefully).
v.9_7xd. uptime: 2 days. resetCounter=1.
v9_7xd. uptime: 4 days 4hrs, resetCounter=2 @patience4711 I'd like to ask another time for an option to get the "retained" flag set on MQTT messages. in the case of a restart of the system consuming the MQTT messages, with the "retained" flag not set, it would start with an "unknown" state for the energy production (until the next MQTT messages is posted). If precautions are not built into the formula, a transformation of Wh to kWh would result in a zero production value. A "total_increasing" energy sensor will handle this as a "reset". This scenario could be avoided by having the MQTT messages posted with the "retained" flag set. In this case, after a restart, the consuming system will get back the latest value immediately. As far as Home Assistant is concerned: such a restart happens every time you add/modify an MQTT sensor, and re-load the MQTT sensor definitions. If adding UI elements is an issue, this could be handled by special Dom.Idx values (>= 10, for ex.)
v9_7xd. uptime 6 d 16h . resetCounter = 3.
@frtz13 Oke that looks stable enough to me. The resets do not bother? About the retain flag, i can't use the idx to decide that. If i make that optional it is complicating the program too much. What if i would give the formats 3 and 4 standard the retain flag. Could that bother other users? I can't see any adverse effects, what do you think?
@patience4711 right now we are at nearly 9 days uptime and resetCounter=4. Such infrequent resets do not bother me. about the "retained" flag: I do not see any adverse effects, either.
I have had some resets, especially today, when the data of 1 inverter wasn't even JSON formatted in the MQTT string. A removal from the config and re-adding the inverter solved the problem. FWIW, my longest online periode was 4 days. I can see free mem is at reboot of the esp 18000 bytes. When free mem is under 10000, the esp wil reboot. So memory is an issue here.
I like the program and interface, but there are a few pages which triggers a memory consumption, which we don't want. Since I'm unable to compile the program, due to lib issues I can't help in finding a solution.
I'm fine in having a retained flag, I think about the data in Home assistant and other programs, could a retained flag affect the way the data is received? At this moment the data timeout after 5 mins and gives a not available sign in the dashboard. What if the data stream stops, would the old data been displayed? For the tot energy consumption, after a reboot the data will be in the display, so that would be great.
I really appreciate your work I know how hard it can be sometimes with people asking almost everything about the program, installing and using....
@swbouman the "retained" flag only changes what happens when a consumer freshly connects to the MQTT broker. when - for a topic - a message has been posted with the "retained" flag, he'll get this message. otherwise, he has to wait for the next message which will be posted for the topic. you decide in the configuration, what happens when no more MQTT data is received (expire_after parameter in the sensor configuration). for power, voltage, current, temperature values I find it useful to let these sensors expire after a bit more than 5 minutes.
@frtz13 here is version v9_7xf, this has the retainflag set for mqtt formats 3 and 4. Can you see in the log that the zigbee was reset very midnight?
Please note: if you downloaded v9_7xe, do not install as it was accidentally compiled with testflag, that wont work.
@swbouman the retain flag retains the last message until its overwritten, so any new subscriber gets always the last message instantly. This doesn't affect anything else, only a new subscriber would know it. If i remember well, someone else was able to compile (look at the issues). I still don't know if the new frontpage causes the reboots, you could shine a light on that by testing with the old recompiled 9_6c version. The ESP is running an asynchonous webserver that is very hard to debug but i suspect that it could crash when the data of all inverters at once is retrieved by a client. With only a few inverters this doesn't seem to happen so to me it seems that with 9 inverers too much is squeezed out of the ESP with that new frontpage.
In the current version (v9_6) we can see inverteroutput of only one inverter that we selected in the frontpage. I developed a webpage that shows data from all inverters at once. Please see the picture below to get an idea.
Since i have no working ESP-ECU i can only test in a artificial environment, and that seems to work. I would appreciate it very much if someone is willing to install this and give feedback. I am especially interested in users who have more than 3 inverters and have used version 9_6 without any problem.