home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.16k stars 30.2k forks source link

Omada integration API requests failing after controller restart #119901

Closed phipolis closed 3 months ago

phipolis commented 3 months ago

The problem

It appears the tplink_omada integration fails to renew its login session when the external Omada Controller restarts, preventing communication. HA and the integration appear oblivious to this degraded state, the main tell is that PoE switch entities (switch.1c_61_b4_xx_xx_xx_port_n_poe) go unavailable until the integration is reloaded. The integration's logs indicate API data fetch requests continue to fail after the remote endpoint comes back online.

Setup includes an Omada PoE switch and the software Omada Controller version 5.13.30.8. Integration is connecting via a dedicated Controller user with a scoped role.

What version of Home Assistant Core has the issue?

core-2024.6.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

tplink_omada

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tplink_omada

Diagnostics information

Normal log line repeating every 5m:

2024-06-18 10:03:55.110 DEBUG (MainThread) [homeassistant.components.tplink_omada.coordinator] Finished fetching Omada API Data - 1C-61-B4-XX-XX-XX Ports data in 0.040 seconds (success: True)

(Omada Controller restarts) Request times out:

2024-06-18 10:09:05.072 ERROR (MainThread) [homeassistant.components.tplink_omada.coordinator] Timeout fetching Omada API Data - 1C-61-B4-XX-XX-XX Ports data
2024-06-18 10:09:05.073 DEBUG (MainThread) [homeassistant.components.tplink_omada.coordinator] Finished fetching Omada API Data - 1C-61-B4-XX-XX-XX Ports data in 10.003 seconds (success: False)

(Omada Controller comes back online) Unsuccessful log line repeating every 5m:

2024-06-18 10:14:05.113 DEBUG (MainThread) [homeassistant.components.tplink_omada.coordinator] Finished fetching Omada API Data - 1C-61-B4-XX-XX-XX Ports data in 0.043 seconds (success: False)

(Integration is manually restarted) Normal log lines resume:

2024-06-18 10:36:42.660 DEBUG (MainThread) [homeassistant.components.tplink_omada.coordinator] Finished fetching Omada API Data - Firmware Updates data in 0.140 seconds (success: True)
2024-06-18 10:36:42.906 DEBUG (MainThread) [homeassistant.components.tplink_omada.coordinator] Finished fetching Omada API Data - 1C-61-B4-XX-XX-XX Ports data in 0.115 seconds (success: True)

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 3 months ago

Hey there @markgodwin, mind taking a look at this issue as it has been labeled with an integration (tplink_omada) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `tplink_omada` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign tplink_omada` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


tplink_omada documentation tplink_omada source (message by IssueLinks)

phipolis commented 3 months ago

On further investigation, there's something more to this I haven't identified. I also wasn't aware of the 1 hour short circuit in _check_login() when gathering the logs above.

I will continue to investigate. Currently I'm only able to reproduce when the docker container updates, so I'm at the mercy of waiting for the next update unless I can identify a different cause or I switch to a local docker repository where I can control the tagging.

phipolis commented 3 months ago

With Home Assistant 2024.7.1, I'm seeing the integration recover after 40-60 minutes, closing.