ksheumaker / homeassistant-apsystems_ecur

Home Assistant custom component for local querying of APSystems ECU-R Solar System
Apache License 2.0
166 stars 42 forks source link

ECU-R (SunSpec logo) & ECU-C stop responding after a couple of days #71

Closed HAEdwin closed 1 year ago

HAEdwin commented 2 years ago

I'm in the process of adding extended error logging and raising errors to keep things running smoothly. It seems that the exception where the connection can be established but no further data is sent from the ECU leads to a socket that is left open. The old ECU-R is not sensitive to this, but the new one (with SunSpec logo) might be.

HAEdwin commented 2 years ago

There is a flaw in APSystemsSocket.py Line 106: except TimeoutError as err: Reading a socket might return a tuple here and in fact, when the socket was able to connect to the ECU (port=open) but the ECU doesn't return any data, an exception occurs which is not being captured by "except TimeoutError". This results in a port which is left open. Over time this might result in port exhaustion.

Therefor it's best to capture all possible situations in one go to make sure the port is being closed. Change line 106 and restart HA. Line 106: except Exception as err:

I hope this contributes to continuity and the ECU stays in the air longer, please let me know!

(in this same function self.add_error(msg) the msg is not being reflected in the log and might as well be left out).

Hasharderwyk commented 2 years ago

Thanks alot! Maybe i will try it later this week. If so i will post the result :) Have a nice day.

HAEdwin commented 2 years ago

I allready initiated a PR for issue #71 because I think that the ECU-R pro firmware has a smaller range of ports assigned for installing the ECU using the ECUapp (not many are needed after all). Maybe to make space for modbus communication. It's a wild guess.

ksheumaker commented 2 years ago

71 has been merged into 1.2.13 thanks @HAEdwin

Hasharderwyk commented 2 years ago

Hello, yesterday i've updated the integration to 1.2.13 thanks alot! The integration is still running flawless with the ecur pro. I will keep you'll updated if its stays running next view days.

Hasharderwyk commented 2 years ago

Hello there! I have checked the integration again and its frozen if i reboot the ecu by unpluging the mains for a short while tis working again. This is what the log said....

Logger: custom_components.apsystems_ecur Source: custom_components/apsystems_ecur/init.py:102 Integration: APSystems PV solar ECU (documentation) First occurred: 14:51:28 (1 occurrences) Last logged: 14:51:28 Using cached data from last successful communication from ECU. Error: index out of range.

Logger: custom_components.apsystems_ecur Source: helpers/update_coordinator.py:222 Integration: APSystems PV solar ECU (documentation) First occurred: 15:01:33 (1 occurrences) Last logged: 15:01:33

Error fetching apsystems_ecur data: Error using cached data more than 5 times. See log, and try power cycling the ECU. 2022-05-03 16:24:14 WARNING (SyncWorker_5) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''

Maybe this info will help something, i tis a bit out of my league of knowledge for now.

Setup: Firmware: ECU_R_PRO_2.0.6 Ecu-r with sunspec logo. Hardwired connected and a static ip.

If i can post some more info let me know. Greetings!

HAEdwin commented 2 years ago

@Hasharderwyk In the past period I spent some days to see how stability can be increased. Of course I was only able to test with my own ECU (an older ECU-R without SunSpec logo on the back). In my case the number of timeouts has decreased enormously so it benefits me but I have absolutely no idea how the other ECU's react to this. In any case, you can download my code here. https://github.com/HAEdwin/homeassistant-apsystems_ecur/archive/refs/heads/main.zip

Extract from the zip only the file: APSystemsSocket.py and overwrite the existing file at: config/custom_components/apsystems_ecur make sure you restart HA afterwards. (You can also rename the existing file first and then copy the new file into the folder so you can easily restore the old situation)

Not sure if it will help you. What I did was trying to use another method of receiving complete dataframes, added some extra visual log information when timeouts or socket errors occur, added a delay between question and answer, added buffer flushing.

Please let me know if this has any effect on the stability.

harderwykr commented 2 years ago

@HAEdwin Thanks for your info. I'm going to try it and I'll let you know how it goes. If I can provide any other information from, for example, a log or something, just let me know.

HAEdwin commented 2 years ago

@harderwykr That would be great. I'm particularly interested in the log to see how often a timeout occurs and at which step in the entire process (ECU, inverters, signal strength). Besides the maintenance at night, they should not occur as often anymore.

HAEdwin commented 2 years ago

@harderwykr Do you already have some experience with the adjustments I made? I consider this as the maximum that can be obtained, the rest is a shortcoming of the ECU. So all hopes are on this adjustment.

Hasharderwyk commented 2 years ago

Hi! Its now running for the second day still without freezing. Verry nice! I do not have much experience with HA so i am figuring out how to log this integration to get some log data for you. Until now its working nice :)

For now i have got this log info;

_{ "home_assistant": { "installation_type": "Home Assistant OS", "version": "2022.4.5", "dev": false, "hassio": true, "virtualenv": false, "python_version": "3.9.9", "docker": true, "arch": "aarch64", "timezone": "Europe/Amsterdam", "os_name": "Linux", "os_version": "5.10.92-v8", "supervisor": "2022.05.2", "host_os": "Home Assistant OS 7.5", "docker_version": "20.10.9", "chassis": "embedded", "run_as_root": true }, "custom_components": { "sonoff": { "version": "v2.4.7", "requirements": [ "pycryptodome>=3.6.6" ] }, "afvalwijzer": { "version": "2022.04.01", "requirements": [] }, "hacs": { "version": "1.24.5", "requirements": [ "aiogithubapi>=21.11.0" ] }, "apsystems_ecur": { "version": "v1.2.11", "requirements": [] } }, "integration_manifest": { "codeowners": [ "@ksheumaker" ], "config_flow": true, "dependencies": [], "documentation": "https://github.com/ksheumaker/homeassistant-apsystems_ecur", "domain": "apsystems_ecur", "name": "APSystems PV solar ECU", "version": "v1.2.11", "requirements": [], "is_built_in": false }, "data": { "entry": { "ecu_id": "216200025331", "qty_of_inverters": 6, "qty_of_online_inverters": 6, "lifetime_energy": 704.8, "current_power": 1937, "today_energy": 5.44, "firmware": "ECU_R_PRO_2.0.6A", "timezone": "", "lastupdate": "'-- ::", "ecu_raw_data": "b'41505331323031303130303031323136323030303235333331303100001b880000079100000220202205221018120006000631303031364543555f525f50524f5f322e302e364130303080971b030af060c5a87a90e9303000000000000000000000454e440a'", "inverter_raw_data": "b'415053313130303138303030323031454e440a'", "inverter_raw_signal": "b'415053313130303138303033303031454e440a'", "errors": [ "[2022-05-21 17:50:28.009221] ECU returned 0 for lifetime energy, raw data=b'APS120113000121620002533101\x00\x00\x00\x00\x00\x00\x03\x9f\x00\x00\x00\x00 \"\x05!\x17FP\x00\x06\x00\x0610016ECU_R_PRO_2.0.6A012Europe/Paris\x80\x97\x1b\x03\n\xf0`\xc5\xa8z\x90\xe900\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'" ], "inverters": { "704000021915": { "uid": "704000021915", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 27, "signal": 81, "model": "DS3", "channel_qty": 2, "power": [ 19, 20 ], "voltage": [ 239, 239 ] }, "704000021232": { "uid": "704000021232", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 50, "signal": 74, "model": "DS3", "channel_qty": 2, "power": [ 229, 233 ], "voltage": [ 242, 242 ] }, "704000021123": { "uid": "704000021123", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 54, "signal": 75, "model": "DS3", "channel_qty": 2, "power": [ 224, 228 ], "voltage": [ 242, 242 ] }, "704000021045": { "uid": "704000021045", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 27, "signal": 81, "model": "DS3", "channel_qty": 2, "power": [ 24, 102 ], "voltage": [ 239, 239 ] }, "704000019989": { "uid": "704000019989", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 53, "signal": 74, "model": "DS3", "channel_qty": 2, "power": [ 251, 244 ], "voltage": [ 241, 241 ] }, "704000016372": { "uid": "704000016372", "online": true, "unknown": "01", "frequency": 50.0, "temperature": 34, "signal": 74, "model": "DS3", "channel_qty": 2, "power": [ 226, 88 ], "voltage": [ 242, 242 ] } } } } }

HAEdwin commented 2 years ago

Ok, there should be some logging in the HomeAssistant.log file. If you've installed "File Editor" you're able to browse to this file. "File Editor" is a handy Add-on.... [Settings] > [Add-ons] > [Add-on Store] image

image

dewi-ny-je commented 2 years ago

I got it running up to 4 days,but then it fails. I guess I'll have to either switch to the component using scraping of the webpage or reset the ECU every night with a smart power plug.

HAEdwin commented 2 years ago

@dewi-ny-je How many timeouts did you get in the homeassistant.log?

dewi-ny-je commented 2 years ago

In the GUI/logs I can see:

Logger: custom_components.apsystems_ecur
Source: custom_components/apsystems_ecur/__init__.py:102
Integration: APSystems PV solar ECU (documentation)
First occurred: May 22, 2022, 04:39:38 (214 occurrences)
Last logged: May 22, 2022, 22:35:14

Using cached data from last successful communication from ECU. Error: timed out
Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable

Logger: custom_components.apsystems_ecur
Source: helpers/update_coordinator.py:223
Integration: APSystems PV solar ECU (documentation)
First occurred: May 22, 2022, 05:04:53 (1 occurrences)
Last logged: May 22, 2022, 05:04:53

Error fetching apsystems_ecur data: Error using cached data more than 5 times. See log, and try power cycling the ECU.

This error originated from a custom integration.

Logger: custom_components.apsystems_ecur
Source: custom_components/apsystems_ecur/__init__.py:96
Integration: APSystems PV solar ECU (documentation)
First occurred: May 19, 2022, 09:18:31 (10 occurrences)
Last logged: May 19, 2022, 18:48:32

Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\t\x94\x00\x00\x02\xc5 "\x05\x19\x1566\x00\x06\x00\x0510016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x04\xdb\x00\x00\x03\xd7 "\x05\x19\x17\x11C\x00\x06\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x03\\\x00\x00\x00\x00 "\x05\x19\x171D\x00\x00\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00 "\x05\x19\x17AF\x00\x00\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x04\xdd\x00\x00\x00\x00 "\x05\x19\x18FU\x00\x00\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
HAEdwin commented 2 years ago

The returned 0 on lifetime energy is somewhat strange. Have to investigate that. Can you open and share the homeassistant.log?

dewi-ny-je commented 2 years ago

The messages which are repeated were replaced with "...". Also, I have an automation disabling polling between 1h after sunset and 1h before sunrise. I'm running the current integration since May 13th, which may explain why the first issue is on the 19th (4 days).

2022-05-19 09:18:31 WARNING (SyncWorker_6) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x02\xc9\x00\x00\x00N "\x05\x19\t\x16\x00\x00\x06\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
...
2022-05-19 18:48:32 WARNING (SyncWorker_3) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: ECU returned 0 for lifetime energy, raw data=b'APS120117000121620004722301\x00\x00\x00\x00\x00\x00\x04\xdd\x00\x00\x00\x00 "\x05\x19\x18FU\x00\x00\x00\x0610016ECU_R_PRO_2.0.6A016Europe/Amsterdam\x80\x97\x1b\x03\x88P`\xc5\xa8~\xf8\x1400\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00END\n'
2022-05-22 04:39:38 WARNING (SyncWorker_5) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: timed out
2022-05-22 04:44:41 WARNING (SyncWorker_3) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable
...
2022-05-22 05:04:53 WARNING (SyncWorker_8) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable
2022-05-22 05:04:53 ERROR (MainThread) [custom_components.apsystems_ecur] Error fetching apsystems_ecur data: Error using cached data more than 5 times. See log, and try power cycling the ECU.
2022-05-22 05:09:56 WARNING (SyncWorker_8) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable
...
2022-05-22 22:35:14 WARNING (SyncWorker_0) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: [Errno 113] Host is unreachable

With all the respect for the effort you put into the component and the good results achieved so far, it seems that ECU-R-PRO up to firmware 2.0.6A just cannot be read out that way. Why not merging the work from https://github.com/tv3/homeassistant-apsystems_ecur/commits/main and read that unit via HTTP scraping, leaving only the lifetime energy the current way? maybe by reading far fewer parameters (just one) the ECU-R-PRO will be able to hold for few months, at which point it could be acceptable. Even better, offer an option (a switch) to disable reading that property, so that the component with ECU-R-PRO only does http scraping and has complete reliability.

By limiting said behaviour to the firmware <= 2.0.6A, whenever a new firmware is released the current method will be enabled again, maybe it will work with the new firmware.

HAEdwin commented 2 years ago

I agree with you, I've put maximum effort in optimizing/improving and personally I think this is it. So the other options scraping and modbus remain open to achieve more stability.

From what I noticed, I've got far better results with continuous polling, so I disabled the sceduled pause in polling (and set the cache count to 30 so that it jumps over the maintenance). I excluded logging from the code. If the cache is used I'll find that here for reference. For my ECU-R it is stable enough this way (polling once every minute). image

dewi-ny-je commented 2 years ago

@HAEdwin I tried modbus TCP too but it wasn't stable and the data returned was not always correct. After testing it I disabled it again just to be sure not to take more resources than needed. I don't remember which component I used but I think there's only one.

I haven't tried modbus via physical connection but then we are talking about a different level of complexity.

I can try using your fork which is few commits ahead and report back. If neither works, it's probably time to merge the scraping and @tv3 I think was already interested in helping.

After all scraping could also be an option and not automatically selected depending on the ECU model and firmware version! But I feel that both techniques belong to the same integration...

HAEdwin commented 2 years ago

@dewi-ny-je Yes, you should try my fork! because it really decreased the timeouts in my situation also includes a better use of the sockets with some cooldowns. I posted @ksheumaker a message to review/try this also. You only have to replace APSsystemsSocket.py TV3 added the valuable scraping method, for the socket part - I'm done :) this should be it, no more tuning or else use the other method of scraping.

dewi-ny-je commented 2 years ago

Yes, you should try my fork!

I placed your updated file, I set the polling to 60 s, I also switched to the wired IP address instead of the wifi one.

I disabled the automation which stopped polling at night and which rebooted the device after sunset. Let's see.

dewi-ny-je commented 2 years ago

Rebooted yesterday at 17:38, today at 11:27 it stopped reporting with "index out of range". I'm setting the nightly reboot again

HAEdwin commented 2 years ago

@dewi-ny-je Ok, so the ECU still respons but the integration is reporting "index out of range". It would be nice if there's more info on that, I'm not sure where that comes from. Anyway scraping would improve stability for the ECU-R-Pro users.

gratjetost commented 2 years ago

@HAEdwin @dewi-ny-je Any update on this issue? I'm experiencing the same issue with my ECU-C. working for 3-4 days and then it gets unavailable. Its almost 4 days on the dot. Turning the ECU-C off and on again fixes the issue. I tried also HAEdwin's fork but it does the same. I wish I could give more info or input, but I'm a noob in programming.

This is my logging from today when it happened:

`This error originated from a custom integration.

Logger: custom_components.apsystems_ecur
Source: custom_components/apsystems_ecur/__init__.py:102
Integration: APSystems PV solar ECU (documentation)
First occurred: 19:47:04 (1 occurrences)
Last logged: 19:47:04

Using cached data from last successful communication from ECU. Error: index out of range'
'This error originated from a custom integration.

Logger: custom_components.apsystems_ecur
Source: helpers/update_coordinator.py:223
Integration: APSystems PV solar ECU (documentation)
First occurred: 20:12:04 (1 occurrences)
Last logged: 20:12:04

Error fetching apsystems_ecur data: Error using cached data more than 5 times. See log, and try power cycling the ECU.'
'This error originated from a custom integration.

Logger: custom_components.apsystems_ecur
Source: custom_components/apsystems_ecur/__init__.py:96
Integration: APSystems PV solar ECU (documentation)
First occurred: 19:52:04 (8 occurrences)
Last logged: 20:27:04

Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
`
`2022-06-04 19:47:04 WARNING (SyncWorker_4) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: index out of range
2022-06-04 19:52:04 WARNING (SyncWorker_1) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 19:57:04 WARNING (SyncWorker_7) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:02:04 WARNING (SyncWorker_4) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:07:04 WARNING (SyncWorker_0) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:12:04 WARNING (SyncWorker_2) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:12:04 ERROR (MainThread) [custom_components.apsystems_ecur] Error fetching apsystems_ecur data: Error using cached data more than 5 times. See log, and try power cycling the ECU.
2022-06-04 20:17:04 WARNING (SyncWorker_1) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:22:04 WARNING (SyncWorker_2) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''
2022-06-04 20:27:04 WARNING (SyncWorker_4) [custom_components.apsystems_ecur] Using cached data from last successful communication from ECU. Error: could not extract checksum int from 'ECU Query' data=b''

`

dewi-ny-je commented 2 years ago

@gratjetost I simply reboot the ecu-r pro every evening 1h after sunset.

gratjetost commented 2 years ago

@dewi-ny-je do you have a command for that? Because my ECU-C is connected straight to the mains power on 3 fases. And it has a switch / power button on the ecu itself. So for me no option to reboot it easily. Also my ecu connection to hacs can drop in the middle of the day.

tv3 commented 2 years ago

If you have an ecu-r-pro, like me, then you can use my alternative integration. Mine has been running stable for weeks.

dewi-ny-je commented 2 years ago

@gratjetost

@dewi-ny-je do you have a command for that? Because my ECU-C is connected straight to the mains power on 3 fases. And it has a switch / power button on the ecu itself. So for me no option to reboot it easily. Also my ecu connection to hacs can drop in the middle of the day.

I would suggest using tv3's integration, I will switch to it as well when I have time.

Temporarily I do the reboot with a smart plug before the ecu-r pro, so I cut the power for few seconds and then I restore it

gratjetost commented 2 years ago

@dewi-ny-je thats what I thought, for me a smart plug is not an option due to the 3 fase connection. @tv3 I'll give your fork a try. hopefully thats also stable! I will keep you updated.

HAEdwin commented 2 years ago

For ECU models other than the ECU-R starting with ID 2160 and ECU-B, I would indeed recommend the TV3 fork. I think I tried to get the most out of it but the other ECU models just don't seem to be able to handle it.

dewi-ny-je commented 2 years ago

@gratjetost

@dewi-ny-je thats what I thought, for me a smart plug is not an option due to the 3 fase connection.

I don't understand. The ECU-R is a DC device with a single phase AC to DC power Brock, isn't it connected to only one phase? I also have three phases at home and solar panels on two out of three phases, but the ECU still has a wall power brick with the normal single phase plug.

gratjetost commented 2 years ago

@dewi-ny-je the ECU-C doesn't have a power brick. Only straight cable connection to the grid. Screenshot_20220609-082213717 (1) So for me the only option to turn it on or of is using the power button on the ECU-C itself. Screenshot_20220609-083017072 (1)

dewi-ny-je commented 1 year ago

If you have an ecu-r-pro, like me, then you can use my alternative integration. Mine has been running stable for weeks.

@tv3 it is stable indeed, but the current main branch does not return each individual panel power, only the total power and the daily energy. The temperature is there, however.

tv3 commented 1 year ago

I'll look into the missing data (after my holidays:-)). Could you supply some more details of your setup, like inverter/panel type etcetera? Also a dump of the returned json would be appreciated.

dewi-ny-je commented 1 year ago

@tv3 see https://github.com/tv3/homeassistant-apsystems_ecur/issues/7 With JSON you mean the "realtime" webpage, right? I'm not aware of any other JSON.

HAEdwin commented 1 year ago

I am closing my own issue for the sake of clarity. Feel free to open a new issue if you have problems with this integration.