Open GaryBoone opened 3 weeks ago
Hey there @lash-l, mind taking a look at this issue as it has been labeled with an integration (roborock
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
roborock documentation roborock source (message by IssueLinks)
Thanks for the detailed report. Regarding your 3 problems:
(1) the error appears as a timeout of 4 seconds. It looks like the local client sets a 4 second connection and the cloud client sets a 10 second connection. I don't have an opinion about whether or not that is too short in practice, but i'm include to leave this as is. Perhaps it could be increased. (Would rather set a higher timeout if we are willing to wait longer rather than retrying)
2 and 3 are the same issue where its not raising an error that home assistant can handle. That said, I don't have an y opinion on app daemon specifics since its not part of home assistant core.
I'm going to address 2/3 by sending a PR to improve error handling for some exception cases for these entities/services to match the behavior in some of the other entities/services.
Hey Allen, thanks for taking a look at this - as context - local requests always go through in under 2 seconds for me, hence the 4s timeout. Typically, the problem I have run into in the past isn't due to timeout but rather another error along the way. The python library has some shortcomings here, but if I'm remembering correctly, you can't really know if the request failed as it doesn't send a response.( I could be misremembering it has been a while since I've looked at that bit of the code and the requests)
As well, if one requests fails while others are also waiting for a response, they will all fail iirc.
Both of those things could be wrong and could potentially be something I've improved and forgotten about but just thought I'd give the context
While i was testing with a local home assistant i saw one from the update coordinator fail taking longer than 4 seconds.
2024-08-20 03:55:07.528 ERROR (MainThread) [homeassistant.components.roborock.coordinator] Error fetching roborock data: id=25747 Timeout after 4 seconds
2024-08-20 03:55:38.675 INFO (MainThread) [homeassistant.components.roborock.coordinator] Fetching roborock data recovered
However maybe the wifi was flaky...
Yeah interesting. It's been my intention for a while to keep track of how long calls are taking and return that in diagnostic info. Potentially even dynamically change timeout and add in re attempting api calls. There is even a flag with commands along the lines of retry iirc but I have not fully reverse engineered it.
If we should think about this more like a publish/subscribe system then maybe longer term the model needs to change a bit. Send the command and don't necessarily block on a result, but update state later once the message is received. (That said, i don't really understand the internals here yet)
The issue about common/frequent timeouts we can track in the other existing issue #98013 so we'll treat this is about error handling.
The problem
In an AppDaemon app, I have lines like:
Although these commands often work, they often fail silently, meaning that they fail to raise an exception that the above try/except catches.
Instead, the appdaemon.log shows:
In the homeassistant.log, I see full stacktraces ending in errors like:
[See the complete stacktrace below.]
There are at least 3 bugs here:
1) The Roborock API has a problem with the calls, or their inputs, or takes too long to respond, leading to the stacktrace shown below. If there's a problem with inputs, it fails to validate them.
2) The Roborock API, the HomeAssistant core, or the AppDaemon AddOn doesn't handle timeouts without bombing.
3) The problem appears in the Roborock AddIn code, but doesn't propagate to my AppDaemon code where I can handle it.
What version of Home Assistant Core has the issue?
core-2024.8.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Roborock
Link to integration documentation on our website
https://www.home-assistant.io/integrations/roborock/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
As a workaround, I have code to retry the
call_service
calls, checking that the updated values are actually set. Until the issues reported above are fixed, others my find this helpful in creating reliable code: