Closed mdeweerd closed 2 years ago
If an outgoing command is cancelled, it shouldn't be retried. Preventing cancellation doesn't seem like a good idea.
It happened during a script run where I do 101 iterations and this exception stopped the loop.
I do not have an apparent reason why this occurred.
After adding Cancelled to the accepted exceptions list, it passed.
For the record, this is an exception trace (HA2021.12.10):
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1368, in _send_request_raw
response = await asyncio.shield(
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/core.py", line 1511, in catch_exceptions
await coro_or_task
File "/usr/src/homeassistant/homeassistant/core.py", line 1530, in _execute_service
await handler.job.target(service_call)
File "/config/custom_components/zha_toolkit/__init__.py", line 415, in toolkit_service
raise handler_exception
File "/config/custom_components/zha_toolkit/__init__.py", line 379, in toolkit_service
await handler(
File "/config/custom_components/zha_toolkit/zcl_attr.py", line 255, in attr_write
result_read = await cluster.read_attributes(
File "/usr/local/lib/python3.9/site-packages/zigpy/zcl/__init__.py", line 297, in read_attributes
result = await self.read_attributes_raw(to_read, manufacturer=manufacturer)
File "/usr/local/lib/python3.9/site-packages/zigpy/device.py", line 265, in request
result, msg = await self._application.request(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 453, in request
return await self._send_request(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1483, in _send_request
response = await self._send_request_raw(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1386, in _send_request_raw
raise InvalidCommandResponse(
File "/usr/local/lib/python3.9/site-packages/async_timeout/__init__.py", line 128, in __aexit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.9/site-packages/async_timeout/__init__.py", line 203, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
You can close this for me - I am handling it locally anyway.
What function call is generating CancelledError
? If you're not explicitly cancelling it then this may be a bug in zigpy-znp.
await cluster._write_attributes(attrs, manufacturer=manufacturer)
with manufacturer None, writing 11 to endpoint 1, cluster 513, attribute 8, attr_type 32.
That succeeded in 10 loop cycles before that (with other attribute values), and 101 times after accepting CancelledError in the retryable_request exceptions (which does not mean that CancelledError occurred).
As far as I know I did nothing what could have cancelled this.
Can you provide debug logs of zigpy-znp==0.7.0
sending these requests?
Ok, my setups are HAOS based - 2022.2.3 has this version so I'll upgrade my setup.
What do you want me to acdtivate for debugging? An HA logger setup like this is ok (you know, but just to make sure you notice it: the # lines are not activated).
default: warning
logs:
#homeassistant.core: debug
#homeassistant.components.zha: debug
#bellows.zigbee.application: debug
#bellows.ezsp: debug
custom_components.zha: debug
zigpy: debug
zigpy_znp: debug
#zigpy_deconz.zigbee.application: debug
#zigpy_deconz.api: debug
##zigpy_xbee.zigbee.application: debug
##zigpy_xbee.api: debug
#zigpy_zigate: debug
#zhaquirks: debug
## Extra, not sure this works:
#zigpy.zdo: debug
#zigpy.device: debug
#zigpy_znp.api: debug
#zigpy_znp.zigbee.zdo_converters: debug
#zigpy_znp.zigbee.application: debug
#zigpy_znp.commands.application: debug
#zigpy_znp.commands.zdo: debug
#zigpy_znp.commands.znp: debug
#zigpy_znp.logger: debug
#zigpy.util: debug
A TimeoutError following an attribute read seems to be converted in a Cancelled Error zigpy 0.43.0 zigpy-znp 0.7.0:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1286, in _send_request_raw
response = await asyncio.shield(
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 372, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 575, in _async_call_service_step
await service_task
File "/usr/src/homeassistant/homeassistant/core.py", line 1630, in async_call
task.result()
File "/usr/src/homeassistant/homeassistant/core.py", line 1667, in _execute_service
await cast(Callable[[ServiceCall], Awaitable[None]], handler.job.target)(
File "/config/custom_components/zha_toolkit/__init__.py", line 561, in toolkit_service
raise handler_exception
File "/config/custom_components/zha_toolkit/__init__.py", line 525, in toolkit_service
await handler(
File "/config/custom_components/zha_toolkit/__init__.py", line 608, in command_handler_default
await default.default(
File "/config/custom_components/zha_toolkit/default.py", line 33, in default
await handler(app, listener, ieee, cmd, data, service, params, event_data)
File "/config/custom_components/zha_toolkit/zcl_attr.py", line 392, in attr_write
result_read = await cluster_read_attributes(
File "/config/custom_components/zha_toolkit/zcl_attr.py", line 252, in cluster_read_attributes
return await cluster.read_attributes(attrs, manufacturer=manufacturer)
File "/usr/local/lib/python3.9/site-packages/zigpy/zcl/__init__.py", line 297, in read_attributes
result = await self.read_attributes_raw(to_read, manufacturer=manufacturer)
File "/usr/local/lib/python3.9/site-packages/zigpy/device.py", line 265, in request
result, msg = await self._application.request(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 439, in request
return await self._send_request(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1401, in _send_request
response = await self._send_request_raw(
File "/usr/local/lib/python3.9/site-packages/zigpy_znp/zigbee/application.py", line 1304, in _send_request_raw
raise InvalidCommandResponse(
File "/usr/local/lib/python3.9/site-packages/async_timeout/__init__.py", line 129, in __aexit__
self._do_exit(exc_type)
File "/usr/local/lib/python3.9/site-packages/async_timeout/__init__.py", line 212, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
I've looked a bit closer at this problem and I think the extra traceback context is causing confusion. asyncio.CancelledError
is never raised, only asyncio.TimeoutError
:
import asyncio
import async_timeout
async def inner():
async with async_timeout.timeout(0.1):
await asyncio.sleep(1)
async def main():
try:
await asyncio.shield(inner())
except Exception as e:
print(f'!!! Caught exception: {e!r}')
raise
if __name__ == "__main__":
asyncio.run(main())
Output:
!!! Caught exception: TimeoutError()
Traceback (most recent call last):
File "/tmp/test.py", line 7, in inner
await asyncio.sleep(1)
File "/usr/lib/python3.10/asyncio/tasks.py", line 605, in sleep
return await future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/test.py", line 18, in <module>
asyncio.run(main())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/tmp/test.py", line 12, in main
await asyncio.shield(inner())
File "/tmp/test.py", line 6, in inner
async with async_timeout.timeout(0.1):
File "/tmp/venv/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
self._do_exit(exc_type)
File "/tmp/venv/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
In short, catching asyncio.CancelledError
within any asyncio code without a very good reason will not solve this problem and would introduce other more serious bugs.
Hi @puddly
Baseed on your last message, I found in python3.9 code (asyncio/tasks.py):
async def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay <= 0:
await __sleep0()
return result
if loop is None:
loop = events.get_running_loop()
else:
warnings.warn("The loop argument is deprecated since Python 3.8, "
"and scheduled for removal in Python 3.10.",
DeprecationWarning, stacklevel=2)
future = loop.create_future()
h = loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return await future
finally:
h.cancel()
and in futures.py:
def cancel(self):
"""Cancel the future and schedule callbacks.
If the future is already done or cancelled, return False. Otherwise,
change the future's state to cancelled, schedule the callbacks and
return True.
"""
self.__log_traceback = False
if self._state != _PENDING:
return False
self._state = _CANCELLED
self.__schedule_callbacks()
return True
[...]
def result(self):
"""Return the result this future represents.
If the future has been cancelled, raises CancelledError. If the
future's result isn't yet available, raises InvalidStateError. If
the future is done and has an exception set, this exception is raised.
"""
if self._state == _CANCELLED:
raise exceptions.CancelledError
While I haven't checked in detail if those items are part of the execution path in this case, it looks like a reasonnable assumption that a timeout on a loop results in a CancelledError exception.
The cancellation error does occur but it never leaves the timeout library, as expected: https://github.com/aio-libs/async-timeout/blob/12624c224268a0d24437739b995c601d4f47887f/async_timeout/__init__.py#L210
I'm unable to reproduce this issue myself. The only way I can see catching the cancellation error helping things is if it's preventing a higher level call from cancelling the request, which is unexpected behavior.
I got a "asyncio.CanceledError" on an outgoing command, and that reminded me that this error is not catched in the retryable_request :
https://github.com/zigpy/zigpy/blob/fc8e923e2f2482810b9643c0b2bde149a56a1930/zigpy/util.py#L145
There are quite a few methods that are not defined as retryable - I do not know if you want to update thos. I am implementing the retryable functionnality in zha-toolkit when I need it and when it's missing in zigpy.