ksheumaker / homeassistant-apsystems_ecur

Home Assistant custom component for local querying of APSystems ECU-R Solar System
Apache License 2.0
166 stars 42 forks source link

Duplicated entities #27

Closed dclobato closed 2 years ago

dclobato commented 2 years ago

Sometimes, during the day, the sensors became unavailable. I have to restart HA to get them working again. When the HA come back online, I have several duplicate entities, like the screenshot below

2021-12-29

Any idea why this happens? I usually remove the missing entities, but I have to reconfigure the Energy panel :-/

HAEdwin commented 2 years ago

This is not directly related to the integration, try the HA forum. I found the following which might work: Delete all unavailable (orphan) items and restart Homeassistant. Now all is working as before. At least you don't have to reconfigure the Energy panel if this works. Another suggestion is to restore a backup.

Offcourse it is unwanted behaviour of HA and there is a lot of talk about this subject in the community. One of the possible reasons is MQTT auto discovery but it's requires some more browsing to get to the solution.

If the sensors become unavailable it might be due to the ECU loosing WiFi connectivity, are both led's on the ECU green?

dclobato commented 2 years ago

If the sensors become unavailable it might be due to the ECU loosing WiFi connectivity, are both led's on the ECU green?

After configuring the ECU using WiFi, I keep it connected using only the ethernet cable. Does it need to be connected on WiFi, even if there is ethernet connection?

I found the following which might work: Delete all unavailable (orphan) items and restart Homeassistant. Now all is working as before

That is what I am doing... Removing orphan entities and restarting HA.

One of the possible reasons is MQTT auto discovery

How MQTT is related to this integration?

HAEdwin commented 2 years ago

This component only works if the ECU-R or ECU-B is attached to your network by Wifi. But with the latest firmware you never now what feature is next due to missing release notes. This integration is based on communication between the ECUAPP and the temporary AP of the ECU when setting it up. Previously we found out that port 8899 is only available on WiFi not ethernet but that might have been changed. I'll experiment with that today.

Best practice is not to experiment with the integration or ECU when inverters are down. It does indeed create orphan entities.

MQTT is not related to this integration, it is mentioned in the forum as one of the possible causes of creating orphan entities. The message broker somehow autodiscovers entities it cannot map on excisting entities thus creating new entities. Sounds plausible to me.

HAEdwin commented 2 years ago

Nope, ECU-R firmware ECU_R_1.2.19009 does not support the use of port 8899 on the ethernet port, it does with WiFi. Unfortunately, the integration still relies on a WiFi connected ECU.

dclobato commented 2 years ago

This component only works if the ECU-R or ECU-B is attached to your network by Wifi. But with the latest firmware you never now what feature is next due to missing release notes.

How can I check the firmware version? On my ECU, port 8899 is open on Ethernet

[dclobato@appserver] ~$ nc 10.0.1.28 8899
APS1100160001END
APS12011700012162000068600▒ !0I(10015ECU_R_PRO_2.0.3017America/Sao_Paulo▒▒▒`Ũy▒00END
^C
[dclobato@appserver] ~$ arp 10.0.1.28
Address                  HWtype  HWaddress           Flags Mask            Iface
10.0.1.28                ether   80:97:1b:02:a8:a4   C                     enp2s0
[dclobato@appserver] ~$

2021-12-30 (1)

Best practice is not to experiment with the integration or ECU when inverters are down. It does indeed create orphan entities.

I'll try to remember this :)

Thanks

HAEdwin commented 2 years ago

@dclobato So 10.0.1.28 is your ethernet connected IP-address of the ECU-R-Pro and 8899 is open on this connection? It still puzzles me reading your firmware version "ECU_R_PRO_2.0.3017" I can't find any Google results on ECU-R-Pro hardware/software besides our comments on this subject. But more on topic, duplicate entries are not really an issue which is caused by this integration, do you agree?

HAEdwin commented 2 years ago

I didn't realize there might be errors in the log. Among other things, the length of the firmware string is 3 bytes longer. Something the integration does not take into account in de python code. What does home-assistant.log and home-assistant.log.1 say if the sensors are no longer available?

Emile86 commented 2 years ago

Firts of al thank you for your hard work!

Have exact same issu here. During the day or night sensors become unavailable and the only solution is to restart HA, restarting ECU doesn't have any effect. After restarting i get duplicated entities. Have same issue with the wifi and wired connection to the ECU.

Screenshot_20220113-191149_Home Assistant

Log error when ECU becomes unavailable:

Screenshot_20220113-192343_Home Assistant

HAEdwin commented 2 years ago

The Integration is dependant on connection via WiFi. Specify the IP-address which is assigned to de WiFi adapter from the ECU.

During the night, the inverters go offline. If you continue polling, the integration has to rely on cached data from the last succesful pull from inverters, everything should just continue until the inverters come back online and the ECU can poll them again. Cached data is only present if there was at least one successful pull.

If this happens during the day, the integration was not successfull in contacting the ECU or pulling the first inverter data so a cache could not be build. The error message can disappear by itself, so do not restart HA too quickly, the ECU is being polled once a minute by default. Check if the configuration is correct and you are using the WiFi connected ECU IP-address.

Emile86 commented 2 years ago

In my case i can pull data from ECU in both cases when it connected via wifi and lan. And in both cases after 4 to 6 hours sensors become unavailable. In first case i have waited for more than an hour with hope that connection will be resumed but with no succes. At the same time EMA app remain connected and also no connection drops from HA to ECU ip.

In first case i have restarted ECU to see if integration will pull data but it was not the solution, only way is to restart HA but then i get duplicated entities. Also changed scan interval to 120 and 180 sec.

Al of the actions are done during the daylight to be sure that ECU have data from inverters

HAEdwin commented 2 years ago

Ok, is it's ECU_R_PRO firmware? What are the results if you use sync io instead of async io? The sync query closes the socket and can be used to try when you modify __init__.py at line 72 Replace data = await self.ecu.async_query_ecu() with data = self.ecu.query_ecu() and restart HA.

Emile86 commented 2 years ago

Where can i see firmware version? I will try now to replace "data = await self.ecu.async_query_ecu() with data = self.ecu.query_ecu()" and report back

For now i have new error log from HA

Screenshot 2022-01-14 at 12 06 43
HAEdwin commented 2 years ago

In the Attributes section of your ECU Current Power Entity: image

Emile86 commented 2 years ago

Have tried to replace "data = await self.ecu.async_query_ecu() with data = self.ecu.query_ecu()" and got this error after restarting HA

Screenshot_20220114-225808_Home Assistant

My firmware version is : ECU_R_PRO_2.0.3

Any ideas?

tv3 commented 2 years ago

The ecu.query_ecu()function needs updates for the pro to work. The socket implementation is wrong. I know, because I have the same problem.

tv3 commented 2 years ago

After each request the socket needs to be closed and an new socket opened.

HAEdwin commented 2 years ago

It's hard to get a hold on the problem. @tv3 do you now use the query_ecu() function without problem? It's witten like:

    def query_ecu(self):

        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect((self.ipaddr,self.port))

        sock.sendall(self.ecu_info_query.encode('utf-8'))
        self.ecu_raw_data = sock.recv(self.recv_size)

        self.process_ecu_data()

        cmd = self.inverter_energy_query + self.ecu_id + self.cmd_suffix
        sock.sendall(cmd.encode('utf-8'))
        self.inverter_raw_data = sock.recv(self.recv_size)

        cmd = self.inverter_signal_query + self.ecu_id + self.cmd_suffix
        sock.sendall(cmd.encode('utf-8'))
        self.inverter_raw_signal = sock.recv(self.recv_size)

        sock.shutdown(socket.SHUT_RDWR)
        sock.close()

        data = self.process_inverter_data()
        data["ecu_id"] = self.ecu_id
        data["today_energy"] = self.today_energy
        data["lifetime_energy"] = self.lifetime_energy
        data["current_power"] = self.current_power

        return(data)

You mean it would help if after each command the socket is being closed and again opened with the next command? Like so?

    def query_ecu(self):

        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect((self.ipaddr,self.port))
        sock.sendall(self.ecu_info_query.encode('utf-8'))
        self.ecu_raw_data = sock.recv(self.recv_size)
        sock.shutdown(socket.SHUT_RDWR)
        sock.close()
        self.process_ecu_data()

        cmd = self.inverter_energy_query + self.ecu_id + self.cmd_suffix
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect((self.ipaddr,self.port))
        sock.sendall(cmd.encode('utf-8'))
        self.inverter_raw_data = sock.recv(self.recv_size)
        sock.shutdown(socket.SHUT_RDWR)
        sock.close()

        cmd = self.inverter_signal_query + self.ecu_id + self.cmd_suffix
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.connect((self.ipaddr,self.port))
        sock.sendall(cmd.encode('utf-8'))
        self.inverter_raw_signal = sock.recv(self.recv_size)
        sock.shutdown(socket.SHUT_RDWR)
        sock.close()

        data = self.process_inverter_data()
        data["ecu_id"] = self.ecu_id
        data["today_energy"] = self.today_energy
        data["lifetime_energy"] = self.lifetime_energy
        data["current_power"] = self.current_power

        return(data)
tv3 commented 2 years ago

@HAEdwin yes, correct

HAEdwin commented 2 years ago

This also works on my part, I will run this for a while to see if the extra errors polling my printer will disappear also. @Emile86 are you using the latest version of the integration?

tv3 commented 2 years ago

In the Hector's Go version, he also has a 'cooldown' (aka sleep) between requests. I haven't seen the need for that on my system yet, though. ALso a polling of 5 minutes is fast enough. I think you won't get better sampling with a higher frequency.

HAEdwin commented 2 years ago

Correct, the ECU does no faster interval than 5 minutes so one minute polling is somewhat overkill.

HAEdwin commented 2 years ago

Closing the socket could also be applied to the async io version. But let's see how this runs, clean up code later. Not closing the socket caused these errors in my log: 2022-01-14 10:31:41 WARNING (MainThread) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: Error getting checksum int from 'Signal Query' data=b'' 2022-01-14 10:31:41 ERROR (MainThread) [homeassistant.components.ipp.coordinator] Error fetching ipp data: Invalid response from API: Timeout occurred while connecting to IPP server. 2022-01-14 10:31:41 ERROR (MainThread) [homeassistant.components.syncthru] Timeout fetching syncthru data

Not closing the socket causes indeed issues.

tv3 commented 2 years ago

Also, the ecu_r_pro returns more data from the first request. They added mac addresses, and some other unknown fields. Not really important for the integration, I think.

tv3 commented 2 years ago

Yah. You get a null response

HAEdwin commented 2 years ago

Yes, I left it out because on the ECU-R firmware the WiFi MAC is alway zero. This might be a bug, for other firmware versions I just as well can add it to complete the returned dataset.

Emile86 commented 2 years ago

This also works on my part, I will run this for a while to see if the extra errors polling my printer will disappear also. @Emile86 are you using the latest version of the integration?

Yes I'm using the last version

tv3 commented 2 years ago

I have mine working.

exec(open('./APSystemsECUR.py').read()) ecu=APSystemsECUR('192.168.4.6') a=ecu.query_ecu() print(a) {'timestamp': '2022-01-15 00:07:04', 'inverter_qty': 3, 'inverters': {'502000026756': {'uid': '502000026756', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 74, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}, '502000026705': {'uid': '502000026705', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 73, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}, '502000025906': {'uid': '502000025906', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 73, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}}, 'ecu_id': '216200006785', 'today_energy': 0.0, 'lifetime_energy': 536.4, 'current_power': 0}

tv3 commented 2 years ago

You can make query_ecu asynchronous if HA likes that.

Just put async in front of definition. Then you have async (coroutine) function with plain vanilla socket code. No need for all the async stuff with sockets.

async def query_ecu(self):

Test in python

a=asyncio.run(ecu.query_ecu()) print(a) {'timestamp': '2022-01-15 00:12:04', 'inverter_qty': 3, 'inverters': {'502000026756': {'uid': '502000026756', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 74, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}, '502000026705': {'uid': '502000026705', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 73, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}, '502000025906': {'uid': '502000025906', 'online': True, 'unknown': '02', 'frequency': 0.0, 'temperature': -100, 'signal': 73, 'model': 'YC1000', 'channel_qty': 4, 'power': [0, 0, 0, 0], 'voltage': [0]}}, 'ecu_id': '216200006785', 'today_energy': 0.0, 'lifetime_energy': 536.4, 'current_power': 0}

ksheumaker commented 2 years ago

I'm trying to go through this thread and figure out what's going on, are both of you with issues with the ECU_R_PRO devices?

If so, the asyncio function already closes the socket for the ECU_R_PRO if you check line #112 in async_query_ecu function. That's what self.writer.close() is doing. After re-reading the python docs for the asyncio routines (https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter.close) , it looks like you might want to add a await self.writer.wait_closed()

You might try that to see if it fixes your problem as well instead of reverting to the non async functions.

HAEdwin commented 2 years ago

We can indeed best stick to one of the methods and drop the other. Is it really that easy to make the non-async query async by adding async in front of it ie: async def query_ecu(self):? All the different hardware makes it harder to debug not owning one. We could add an "apsystems_ecur compatibility chart" to see what works and what doesn't and then focus on correcting each one-by-one. I'm loosing track of what everyone owns so an issue template with questions like "what ECU firmware" and "which inverter(s)" could help when adding an issue. Also I would recomment adding hassfest workflow to GitHub, it checks the compatibility of the integration against future releases of HA (the beta channel).

This is also interesting to read: https://community.home-assistant.io/t/apsystem-solar-monitoring/167639/20

tv3 commented 2 years ago

As for the asynchronous part.

The query code itself doesn't have to be async. It is code that needs to be run in sequence (procedural/batch), so there's no point in it being async. What you still want is an async callable function for HA, so it doesn't block HA threads. And the async keyword is enough.

Other point is making sure exceptions are handled properly to prevent (socket) leaks etc.

I've changed things to handle exceptions in the socket interaction.

HAEdwin commented 2 years ago

Great @tv3 because I tried to adapt things but I got more errors in the log over time so I went back to calling self.ecu.async_query_ecu() Looking foreward to your changes.

ksheumaker commented 2 years ago

The async stuff is not for the benefit of this integration, it's for HA as a whole. The point of the async socket code is to not block other co-routines running inside of HA. While you are correct you can just put async in-front of the query_ecu function and await it, the standard socket code is still blocking.

See this example https://bbc.github.io/cloudfit-public-docs/asyncio/asyncio-part-5.html of why just running running synchronous code inside of an async co-routine isn't the best idea. If the socket hangs or is slow to respond, it will cause all other co-routines HA is running in the same context to block.

And one of the error conditions I remember dealing with is the socket just hanging on the ECU. Since it's such a piece of junk hardware/software sometimes the recv function would block because the device wouldn't close the socket, nor would it provide any data. Adding the timeout on the asyncio.wait_for to all the reads from the socket helps catch those conditions.

If you really want to move to using synchronous I/O please checkout the HA developers docs https://developers.home-assistant.io/docs/asyncio_working_with_async and the section about calling sync functions from within async. You would need to change line 72 of init.py to something like this instead

data = await hass.async_add_executor_job(ecu.query_ecu)

I'm pretty hesitant to move away from asyncio as a whole because I've been running this for many months with no issues. My ECU has been very stable, and has not locked up or needed a reset in a very long time, and my HA instance is working very reliably.

HAEdwin commented 2 years ago

I'm pretty hesitant to move away from asyncio as a whole because I've been running this for many months with no issues. My ECU has been very stable, and has not locked up or needed a reset in a very long time, and my HA instance is working very reliably.

I totally agree! It has proven to be very stable in combination with the ECU-R. Though the ECU-R-PRO has issues running this integration.

tv3 commented 2 years ago

@ksheumaker Thanks for the feedback. You built this module and have all the experience. I will go and read the links you sent. I'm still learning about the ECU stuff and appreciate all insights. Btw: I did add a timeout to the socket code to prevent potential hangs.

renini commented 2 years ago

I'm also having the same issue with an ECU-R-PRO. For what it's worth, mine is running 2.0.3016.

APS120116000121620001499501��  !02    10015ECU_R_PRO_2.0.3016Europe/Amsterdam��W`Ũz�00END          

@tv3 can you share the changes you made, for me to try out?

tv3 commented 2 years ago

@renini I had tested it outside HA. Currently running tests with an HA setup. Don’t want to break your setup. When it looks good I’ll give you my changes to check.

renini commented 2 years ago

@tv3 np, i think i have a working setup now, from your comments in this issue. Will see tomorrow whenever there is some sun. I changed the async to sync and the closing of the socket.

Emile86 commented 2 years ago

@tv3 np, i think i have a working setup now, from your comments in this issue. Will see tomorrow whenever there is some sun. I changed the async to sync and the closing of the socket.

And any luck?

tv3 commented 2 years ago

Last update cleared my version. Just reapplied my changes again. As it's dark now, I'll have to wait. It seemed to be running ok. But haven't had a long run of a few days yet.

I also included a sock.settimeout(10) on the socket after open to prevent possible hangs.

ksheumaker commented 2 years ago

I think I have figured out the cause of the original purpose of this issue, which was "Duplicated entries."

I think we need to move discussion about the ECU_R_PRO stability and async vs sync into a new issue.

Anyway, with the duplicated entries - it finally happened to me, and looking through the entity registry (/config/.storage/core.entity_registry) I found the unique_id on these sensors were None_current_power not <ECUID>_current_power. I think this was caused by the caching I'm doing.

Once I've had time to test this, I'll release an updated version.

renini commented 2 years ago

@ksheumaker sounds promising! Many thanks for looking into it.

ksheumaker commented 2 years ago

This fix has been added to the 1.1.2 release which is now available. If people see duplicate sensors please re-open this, or open a new issue.