home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.66k stars 30.41k forks source link

Powerview Polling causing timeout errors #73900

Closed kingy444 closed 6 months ago

kingy444 commented 2 years ago

I put some logging in the new polling implemented under https://github.com/home-assistant/core/pull/73659 This PR resolvee https://github.com/home-assistant/core/issues/70043 but has unfortunately created an issue where shades will timeout from time to time.

While the hub can process the requests the shades cannot always return a result in a timely manner. I have 5 hardwired shades but only 2 enabled for polling and still see these around every 6 hours or so.

These shades are all TDBU, which the current code would cause to poll twice however this is unrelated as testing is the same under PR to resolve that issue: https://github.com/home-assistant/core/pull/73899

image

Originally posted by @kingy444 in https://github.com/home-assistant/core/pull/73659#issuecomment-1162798204

bdraco commented 8 months ago

Using the undocumented home automation API and a semaphore to prevent concurrent requests would be doable.

However improvements in this area are blocked until v3 support is finished as significant changes are being made to the integration

The downside to using the home automation API is it won't work for everybody (there are quite a variety of SSL configurations, and compatibility issues ), and we still have to support both with automatic fallback to the current method if it doesn't work, and detecting that it doesn't work and orchestrating a fallback is not trivial.

Alternatively, if we want to get it out faster, we could drop support for the polling method completely, and only support the home automation API. That would be a breaking change for some people and make the integration unusable for them (likely anyone who has ssl configured), but it would probably fix this problem for everyone else's. Maybe that's OK since this problem seems to cause a lot of headaches, and maybe it's best not to support runtime polling at all given how flakey the hub devices are, but I think we would get some angry issue reports if we did that because there is always someone with an ssl setup that won't work.

trullock commented 8 months ago

I appreciate all the hard work thats going into the v3 refactor, well done guys

You're right, we should see how it performs on V2

I'm glad we have the v3 work coming on and Wez's fix for the crashing, but its bad for the community to have to have none-core integrations because the core one crashes devices, so IMO we do need to adopt Wez's method one way or another

kingy444 commented 8 months ago

Doesn't Wez method require some further configuration outside 'add integration' ?

Given the core will work natively with both gen2 and gen3 and timeout is not affecting ALL users I would be looking at WEZ method as an 'advanced' installation. (Not saying that couldn't be brought into core, just that it can't direct replace)

trullock commented 8 months ago

~It doesnt auto detect it, you just have to give it the hubs IP and your MQTT details.~

Edit: See below, I'm out of date

He only made it as an add-on out of convenience to him for ease of development

wez commented 8 months ago

I want to clarify that my addon does auto-detect the hub IP, and if you're running HAOS or supervised, then mqtt details are automatically passed to it by hass, so beyond installing the mqtt and pview addons, it is automatic.

The technique used in the addon to resolve the polling issue could be done directly in the python integration.

I chose to build this as an addon because it is much easier for me to iterate on it in a single code base in my preferred language than it would be otherwise.

bdraco commented 8 months ago

I couldn't get the postBackUrl to accept an ssl url but think we can work around the ssl issue by starting another webserver on another port. I haven't tried that yet. But it looks like we still have to support polling regardless since the v1 hubs don't support the home automation api so that means we would have the complexity of supporting both

rossc719g commented 7 months ago

I have a gen2 hub with 18 silhouette duolite shades, and experience these lock-ups pretty constantly. It used to be once or twice a week. A power-cycle of the HD hub would fix it, but so would just waiting 24h or so. (Do the hubs reboot themselves at night or something?)

But, now it is happening multiple times a day. When I reboot the hub I only get a few hours before it locks up again. Sometimes I also need to reload the integration.

A while ago I made a simple python script to adjust the blinds using the API. And when the hub locks up, the script stops working, and so does the poweriew app. So, it seems like it is just flat-out crashing the hub.

I'm curious.. Do we think the gen3 rework will include fixes or this too? I'm happy to be used a test-subject to get logs or whatever, if needed.

kingy444 commented 7 months ago

It might, but as I do not experience these issues it is hard to test. There were adjustments to how calls are made but I don't suspect these were the sole cause.

I have seen people with similar lockups describe that they move multiple shades via a HA automation, Hunter Douglas advice was simply that these sort of things should be done via PowerView Scenes. Multiple successive api calls seem to be the cause of the lockups.

As mentioned I don't have any of this setup personally, I primarily use scenes along with HA automations and have no issues. That being said I have also tried to force these issues and have been unsuccessful there too (which leads me to think there are some dodgy hubs out there)

I have a mixture of shades, 11 in total, 5 Top Down/Bottom Up, 6 Bottom Up

kingy444 commented 7 months ago

Most of these reports came in when polling was added for hardwired shades too. I have 6 of the shades hardwired and don't experience any issues

In terms of the overnight, the hubs typically do a maintenance cycle (no clue what's involved) at around 1am each night

wimjanse commented 6 months ago

I have 21 shades (including 3 TDBU shades), all hardwired. I use the Powerview scenes, controlled by push-buttons and/or automations in HA. (Powerview Gen-2 hub), and have individual controls for all 21(+3) shade positions.

I'm using HA since 6 months now, never had any lock-ups of the Hub (except once, when I put a second (development) HA system also with a Powerview integration on-line).

jpearl commented 6 months ago

Thanks for all the hard work on this painful issue. For quite a while now, I haven't been able to issue more than 1 command to a single shade without a lockup. Figured I'd pass along a few repro observations since I saw mentions of not sure how to repro:

  1. Signal strength - seems to play a huge role. If a shade is on the outskirts of a signal range, and/or is far away and connected via several repeaters to get the signal to the shade, this seems to increase the reproducibility. Putting a second hub in the same room as that shade improved things
  2. Template cover entities wrapping real shade entities. Removing this had a HUGE impact on reducing lockouts. I had a "proxy" template cover entities (to work around the previous bug incorrectly exposing tilt on shades that didn't support tilt). It looked as follows:
    - platform: template
    covers:
      master_bedroom_wrapper:
        device_class: shade
        friendly_name: "Master Bedroom"
        value_template: "{{ states('cover.master_bedroom')}}"
        position_template: "{{state_attr('cover.master_bedroom', 'current_position')}}"
        set_cover_position:
          service: cover.set_cover_position
          target:
            entity_id: cover.master_bedroom
          data:
            position: "{{position}}"

Between the master bedroom shade being far from the hub (with several repeaters in the signal path), and trying to control it via this template cover "wrapper", I hit pretty much a 100% lockout.

Hope this helps and looking forward to testing

trullock commented 6 months ago

@jpearl good intel. I too use templated covers to abstract my blinds, i do this because the windows they cover are openable, and if you shut the blind with them open the wind blows in and destroys the blind. so my abstraction prevents the blinds shutting if the window is open. May be contributing like you say.

Has anyone tested this fix yet and confirmed it works? I'm still on @wez's addon