palazzem / ha-econnect-alarm

Home Assistant integration that provides a full-fledged Alarm Panel to control your Elmo/IESS alarm systems.
BSD 3-Clause "New" or "Revised" License
10 stars 5 forks source link

Data loss (no updates) when the connection is unstable #51

Closed palazzem closed 8 months ago

palazzem commented 9 months ago

Describe the bug

Apparently, when the Internet connection is unstable, the integration remains stuck and HA doesn't receive data anymore from e-Connect cloud.

UPDATE 10/5/2023: this issue happened broadly when e-Connect servers went down for a while. It looks like the issue is not related to the Internet connection itself, but is related to a generic "connection instability". Maybe when there are a lot of errors in an integration or too many opened sockets, HA decides to deregister the component for safety or throttling.

UPDATE 10/6/2023: HA doesn't deregister the component and actually does the poll every 20 seconds (timeout time). The issue seems to be in the client.poll() method, that receives from the AlarmDevice a set of IDs that are too far in the future, to a point that they will never get updated. IDs retrieved from AlarmDevice are correct and based on the last call, but unfortunately the server resets the IDs when the connection with between the central unit and the cloud is interrupted.

Example:

# When it works
Polling IDs expected in the update: {9: 49, 10: 73}
Polling IDs registered in the device: {9: 49, 10: 73}

# When it doesn't
Polling IDs expected in the update: {9: 49, 10: 73}
Polling IDs registered in the device: {9: 6577, 10: 9865}

Error message

From logs:

Timeout fetching elmo_iess_alarm data

That said, the integration remains stuck and from that point it is not updated anymore.

Expected behavior

It is expected to have a timeout error, even though it's expected that the integration works as normal when the connection is back.

Additional context

To fix the problem is enough to reload the integration from the UI. A full restart is not required.

To Reproduce

Environment

gervaso commented 9 months ago

I confirm this behavior, it happened this morning as well, I suppose the internet connection was briefly interrupted during the night.

This morning the alarm system was disabled from the keypad but the status in the integration was still "Armed Night"

palazzem commented 9 months ago

@gervaso I'm putting this on-hold as I'd like to ship a new version in the following days! We can include this in a hotfix right after, so we have the right amount of time to triage the issue and be sure it will never happen again!

palazzem commented 9 months ago

It looks like that any issues with the connection or server side, exacerbates this issue. At this point I would say to work on it with high priority as soon as 2.0.0 is released. I plan to:

I'll work on it right after the current pending release.