home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.15k stars 31.12k forks source link

ScreenLogic async_gateway_connect raises an exception #96803

Closed jeromeajot closed 1 year ago

jeromeajot commented 1 year ago

The problem

The connectivity has been a hit and miss. Until 2023.6.1, it was not acknowledging any changes, becoming unavailable quite often. Then from 2023.6.1 until 2023.7.1 everything was fine, no issues. 2023.7.2 arrived and it was misbehaving again. I downgraded to 2023.7.1, then 2023.6.2, then 2023.6.1. Nothing, made it works again.

I restarted Pentair device, no fix.

What version of Home Assistant Core has the issue?

core-2023.6.1

What was the last working version of Home Assistant Core?

core-2023.6.1

What type of installation are you running?

Home Assistant OS

Integration causing the issue

screenlogic

Link to integration documentation on our website

https://www.home-assistant.io/integrations/screenlogic/

Diagnostics information

config_entry-screenlogic-c6df324ab7bd6a64a5eb2b2f9a5812fc.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Log uploaded

Additional information

home-assistant_2023-07-18T02-14-54.584Z.log

home-assistant[bot] commented 1 year ago

Hey there @dieselrabbit, @bdraco, mind taking a look at this issue as it has been labeled with an integration (screenlogic) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `screenlogic` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign screenlogic` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


screenlogic documentation screenlogic source (message by IssueLinks)

jeromeajot commented 1 year ago

Adding a full debug from this morning, this one doesn't show the exception, but failed reconnection at 07:54.

home-assistant_2023-07-18T12-48-57.642Z.log

dieselrabbit commented 1 year ago

Thank you for the log.

It appears that the socket connection between the API and the protocol adapter is being closed outside of the API.

I can't tell if the protocol adapter is closing it or if the host is, but the API is not intentional doing so. It is interesting that the connection appears to be closed immediately after the API sends a request for data to the protocol adapter. That may be an indication that it is the protocol adapter closing the connection as a rejection of what it received, but that's not something I have seen before outside of the login sequence.

Some other possibly relevant points:

@bdraco Are you aware of any other integrations reporting socket connection issues after 2023.7.0?

bdraco commented 1 year ago

@bdraco Are you aware of any other integrations reporting socket connection issues after 2023.7.0?

I'm not aware of anything like that

jeromeajot commented 1 year ago

Some other possibly relevant points:

  • The integration hasn't been updated recently.
  • There is at least one other report of connection issues after HA 2023.7.0.
  • Both reporters appear to be running in a VM according to the diagnostic data, whereas I'm not seeing issues running embedded.

After 2023.7.2 got released and I upgraded the HA, it started (re)misbehaving. I assumed I used the latest integration at that time. HA is running on a VM indeed.

jeromeajot commented 1 year ago

I upgraded back to 2023.7.2.

And here are the latest debug logs: home-assistant_2023-07-20T01-33-12.833Z.log

At 20:04 the exception raised. Since that time, I tried at 21:28 and 21:30 to switch off the pool and the pool light and it behaved the same, command it issued, but nothing happened and triggering a disconnection and reconnection.

If I go to the Pentair ScreenLogic app, I can issue the with no problem. Same using screenlogicpy CLI. So I assume the gateway is fine.

Finally after that, I just rebooted the VM. I turned on the lights in HA, and off in the Pentair App, HA reflected the lights turning off. Doing this again worked. Here are the logs of this experiment: home-assistant_2023-07-20T01-56-18.984Z.log

jeromeajot commented 1 year ago

Additional information. At midnight, a trigger to turn off the pool lights ran and segfaulting inside screenlogic. Since that moment (12 hours after the segfault event), screen logic display are not getting updates, command are not send to the gateway, screenlogic device is unresponsive and a reload is required is required (initiated at 12:12). It looks like when a segfault happen, screenlogic device cannot recover and manual intervention is necessary.

Logs: home-assistant_2023-07-20T16-14-26.440Z.log

dieselrabbit commented 1 year ago

Thank you for the continued logs. There's some weird out-of-order and missing entries in the last one that I need to do a deeper dive on.

dieselrabbit commented 1 year ago

What virtualization software and host OS are you running your Home Assistant VM in?

jeromeajot commented 1 year ago

What virtualization software and host OS are you running your Home Assistant VM in?

Host: Ubuntu 20.04.5 LTS KVM: 4.2.1

dieselrabbit commented 1 year ago

Ok, so I understand what's happening to cause the exceptions, and while the missed exception handling is exposed by intermittent connection issues, they can be resolved on their own.

At this point, I'd like to focus on the raised exceptions here in this issue, and consolidate discussion and troubleshooting on intermittent connections in #96016.

I'll be closing this one out when the exception handling is buttoned up. Thanks for all your help!

issue-triage-workflows[bot] commented 1 year ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.