ksheumaker / homeassistant-apsystems_ecur

Home Assistant custom component for local querying of APSystems ECU-R Solar System
Apache License 2.0
166 stars 42 forks source link

Device query suddenly turns off #255

Closed gismo2004 closed 3 weeks ago

gismo2004 commented 1 month ago

Since this is the second time this happened, I would like to know if there is something I can do wrong that triggers this switching off of the device query.

The toggle switched off in the middle of the night where definitely nobody was awake to trigger this.

Screenshot_20240520-111107.png

If there is something I can do to identify the root cause, would be highly appreciated if someone can give me a hint...

HAEdwin commented 1 month ago

Take a look at the "ECU Using Cached Data" entity. If the integration is using the cache it will do so until the set number of times. Then it will give up en flip the switch.

gismo2004 commented 1 month ago

Yes that's what happens, but what brings the device/integration into this state?

I am using the integration since quite some time now, and this happened now the second time. Both events have been in the night. All the other nights no issue, so something must have been different.

Besides that, what is the benefit to switch device query off in that situation? I mean, after switching it off, it will stay off until someone switches it on. So if you don't see it by chance, it will remain off for a certain time…

image

HAEdwin commented 1 month ago

Causes of the cache being used is often interference (WiFi and Z-wave share the same 2.4 Ghz band) of the signal (replacement of the ECU might help away from the AP). Recently it was also related to issues with the SD/storage, sometimes hardware where HA is running on, removal of WiFi antenna, maintenance between the ECU en EMA (from ~02 AM until ~03 AM). In any case users where able to solve it themselves. Benefit of the switch: This switch enable the user to create automations based on the state of the switch (restart ECU or temporary pause query during night time).

gismo2004 commented 1 month ago

Connection shouldn't be my problem (Inverter offline at EVERY night and works perfectly since March), SD/Storage also very unlikely as HA is running on a Kubernetes cluster with ZFS. So it sounds like my issue is ECU update/maintenance. I think I am able to solve this by doing something like this: if query == off then switch query on. But honestly, isn't that somehow stupid? (no offense at all!) Wouldn't it be better for the folks who are suffering from connection issues and/or require ECU restart to have a channel showing a time since last successful query? In my situation at least, it is counterproductive if the query is switched off, as it requires a query to bring the device back to normal or am I missing something?

Do you have any recommendations on how to deal with my situation?

HAEdwin commented 1 month ago

There is a chance that if you immediately turn the switch back on, queries will still fail. Try it. Experience shows that if the query fails more than three times, a restart of the ECU is necessary for older models use a smartplug newer models can be soft rebooted . You could also stop querying the ECU at night. To see when last unsuccesful query occured you can use the entity like so: image There must have been some change that caused this behavior. It unfortunately requires some local troubleshooting/testing. Read the closed issue list, people have dealt with this before, I guess about 4,5% of the users.

gismo2004 commented 1 month ago

I have checked some issues, but all I have seen was not really the same situation as mine.

My assumption is, that the ECU is fine but since the inverter is offline at night and the restart of the ECU (maybe because of an update) it is in a bad state where a restart will not help as it was already automatically restarted and besides that, the only thing i have done was: switch device query to on. :-) So it might require an online inverter to "heal" the device query again which means, if I do something like: if query == off then wait 10mins and switch on.

If it still fails, (if the inverter is not yet online and my assumption is correct) it should be switched to device query off after a while. Is that correct? And then after another 10 mins, I will switch it on again and "try" again.

Could that work?

HAEdwin commented 1 month ago

I've just closed an issue where the China region was being blocked in the router. This causes issues when there is a firmware update that is being OTA pushed. OTA updates start around 2 AM.

gismo2004 commented 1 month ago

Thx for the replay!

But what exactly is the issue? Even if the OTA is failing on my side, (which I assume not, as the current version is ECU_B_1.2.30 on my ECU and that is definitely higher than my initial setup with EMA Manager) why does it work again when JUST switching device query to on again?

If the ECU were in a bad state, that would not work, correct? So the only thing I can imagine is, that it requires the Inverter to be online to work normally again.

jonasius commented 1 month ago

@gismo2004 Maybe another hint, do you have some kind of WLAN schedule which turns off your WLAN at night?

gismo2004 commented 1 month ago

No, and as I said, it works perfectly fine since the beginning of March. This problem only happened two times by now, and it seems to be always because of the 2am update, as I learned now. But still I don't understand why it is needed to switch device query to off since the only thing I had to do, was switching it on again. I have added now an automation in HA to switch the switch back to on after 10 mins if it goes to off. This basically makes no sens to me, but it seems there is no better way to handle it?

HAEdwin commented 1 month ago

You can set the number of cache uses higher so that there will be less chance that the switch flips. Allthough the inverters might be offline you should still be able to query the ECU (like it does before 02 AM). It's not strange when the cache is being used though the frequency might differ from case to case. Some days I have one cache use, today I have four.

gismo2004 commented 1 month ago

And what will be a good “higher number”? I mean, if the ECU don't like to respond correctly after an OTA until the inverter is online again (assumption) and this happens at 2AM I have to set 4hr/5mins = 48? (while 5 is recommended?)

HAEdwin commented 1 month ago

Depends, if 5 is not enough try 10, 20 and so on. But after all this is not normal behavior you'd expect from the ECU so there must be something else going on locally. During maitenance ECU data is being verified against EMA data and if there is data missing it will resend the data to EMA. Suppose there is a firewall rule that blocks traffic to EMA sites including CN, then this might cause problems.

gismo2004 commented 1 month ago

Ok, then I will try to set it to 50 because it can't be worse than “not working” :-)

What do you mean by verifying data against EMA? I have not enabled the CloudService if you are referring to this? But would that explain, why the ECU starts working without issues after manually switching query to on?

HAEdwin commented 1 month ago

What do you mean by "I have not enabled the Cloudservice" afaik the correct functioning of the ECU depends heavily on cloud connectivity and the ECU being able to flush it's data buffer to the EMA (Energy Monitoring & Analysis) site. If updates to EMA are blocked the ECU will eventually crash somehow.

gismo2004 commented 1 month ago

Why would that be a requirement? ECU has internet connection according to EMA app which enables updates and such, but I am not registered somewhere. Screenshot_20240524-173725.png

HAEdwin commented 1 month ago

It is a requirement to connect the ECU to the internet and create an EMA account, the ECU won't operate cloudless. This is based on the architectural principle that the PV set can be monitored via the EMA site. Blocking the data upload to EMA has a negative effect on the continuity of operation of the ECU. During the day, data is pushed to EMA every 5 minutes. If this fails, the data is stored in a buffer and an attempt to retransmit is made when the inverters are down. In older firmware versions, that buffer would fill up if you blocked EMA, causing the ECU to freeze. Nowadays, APSystems is adapting firmware to activate modbus (with varying degrees of success). I personally think that they could abandon the concept of modbus because of their new EZ series. Cloudless operation can be achieved by returning the appropriate data to the ECU. I experimented with this but never did anything with it due to lack of time and lack of demand. Moreover, it remains difficult to continue developing for hardware that you do not own, compatibility with all ECU models remains key.

HAEdwin commented 3 weeks ago

I'll close this issue for now. If anything else might pop-up feel free to open a new discussion.