zigpy / zigpy-xbee

A library which communicates with XBee radios for zigpy
GNU General Public License v3.0
22 stars 18 forks source link

Limit request concurrency #173

Open puddly opened 6 months ago

puddly commented 6 months ago

Zigpy provides a way for radio libraries to limit request concurrency (by default 8). It seems like zigpy-xbee did not use it. You can increase/decrease the default via YAML:

zha:
  zigpy_config:
    max_concurrent_requests: 8  

@Shulyaka this may actually be the only change required for zigpy-xbee to work properly with the watchdog. If you can find a value that works well for your XBee, this may supersede #172.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (8314dcd) 100.00% compared to head (2b1b21f) 100.00%. Report is 1 commits behind head on dev.

:exclamation: Current head 2b1b21f differs from pull request most recent head 6059c1e. Consider uploading reports for the commit 6059c1e to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #173 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 7 7 Lines 759 760 +1 ========================================= + Hits 759 760 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Shulyaka commented 6 months ago

I've tested this change. Unfortunately it does not solve the problem, because only tx_explicit API frames are limited, but others, such as commands and remote commands, are not. I think we are better off without watchdog at all.

puddly commented 6 months ago

I've reimplemented this PR on top of the unreleased zigpy priority lock/queue:

pip install git+https://github.com/puddly/zigpy@zigpy/priority-lock

This should deprioritize transmit commands so that any watchdog polling task would have to wait only for a single command slot to open up, as opposed to the queue being completely vacated.

Shulyaka commented 6 months ago

I think we need to increase the AT_COMMAND_TIMEOUT anyway.

Consider the follow scenario:

  1. A quirk calls a remote AT command for a device that is powered off
  2. Because the queue is currently empty, the command is sent to the device and a timeout of REMOTE_AT_COMMAND_TIMEOUT started
  3. Watchdog sends the VR command
  4. Although the priority is higher, another command is already waiting, so the command is not actually sent to the device
  5. The AT_COMMAND_TIMEOUT expires earlier, so the VR times out and the connection is reset.