Closed pavoni closed 4 years ago
TLDR; Fixed here: https://github.com/home-assistant/core/pull/35703
I noticed a few tickets for HASS which either suggested that the vera hub occasionally returned consistently inaccurate incremental data and/or a race condition with threading was encountered.
I worked around this by giving the HASS vera integration it's own implementation of SubscriptionRegistry. This implementation pulls all statuses from the hub and uses HASS event loops to collect and deliver the data. Reports from test users confirm this PR fixes their issues.
I debated with myself about whether to continue using incremental retrieval of statuses or pulling them all. After some testing locally (on my 2 hubs), I learned that returning all statuses has no significant performance impact on the hub. So it's reliability was the preferred outcome.
Thanks.
Am I right that the new code polls vera every second via the HA event loop?
I guess there would be two concerns :-
Vera Hub performance which it sounds like you've tested (I do recall having some issues with my Vera Edge where I was first trying this - although it was via the get incremental change call which might be more expensive),
Responsiveness to switch changes - I choose the 200ms for the SUBSCRIPTION_MIN_WAIT after some experimenting to try to get good responsiveness to switches (I have a Vera switch that triggers switching on and off lights via HA). It was never as fast as I would have liked - but less that 200ms seemed to stress vera without helping. How have you found the responsiveness? (Of course right is always better than fast!)
Yes, the poll interval is set every second. I tried running parallel requests every 100ms and ran into zero issues.
My instances have always been very responsive even with this recent HASS code change. I suspect there will be some unexpected issues that come up but we at least remove the race conditions.
Regarding switch responsiveness. I suspect the responsiveness will be the same or better. After all, the VeraSwitch implementation in HA refreshes the status after sending the change.
Great thanks. Will look forward to seeing it merged.
May also swap round some of the unsafe code anyway - even though it won't matter with your new approach.
Just saw the review comments - I guess with a 5s refresh it may be worth trying to combine the methods. 5s certainly wouldn't work for my use case - and perhaps for other users too.
You could still do a 5s refresh - but also trigger your update code in a return from the vera poll/wait call...
I'm confused. The HASS change uses a 1 second refresh interval in a loop. When entities like a switch are activated through HASS, then they activate a force a refresh too. I'm not aware of a 5s refresh anywhere.
There is a requested change on your HA pr saying that the minimum poll interval they allow us is 5s, do they won’t allow 1s.
I somehow missed that feedback. Working on it.
I'm to see you can reproduce the issue. I've been unable to. At any rate, I've updated the HASS vera component to use long polling. Hopefully this will be the best fix.
All working perfectly now - closing!
@vangorra I know you've been looking at this for a while.
My recent experience of going from a working set-up to one that sometimes go wrong when I upgraded HA but left Vera unchanged - suggests it's not a Vera problem - but a pyvera / HA issue
With all the dev work and expertise on HA - I expect async race conditions in HA would get sorted (I suspect the device _2 issue was probably like this something in HA) - which makes me think the most likely issue is something unsafe in pyvera - which is now being exposed by the HA async environment.
As an example I'm suspicious that the following code is unsafe:
https://github.com/pavoni/pyvera/blob/master/pyvera/__init__.py#L791-L792
It initiates a vera action - and then sets the internal state of the pyvera switch.
In the new async world the vera action could set the device - and then receive back an async update before the next line of code setting the state runs - overwriting the real vera state with the intended state. This has always been doubtful (the subscription code has always ben multithreaded) - but a more highly async environment would make errors more frequent.
So I think it would be safer to reverse the two lines of code.
There are quite a few examples of this (my pattern seems to have been initiate the action - then change the local state!).
There may also be other similar issues.
With all your investigation do you have a handle on the circumstances where state information is being lost? Is it mainly switches and other active devices - or is it also sensors (I think i've only seen it on switches so far).