zigpy / zigpy-znp

TI CC2531, CC13x2, CC26x2 radio support for Zigpy and ZHA
GNU General Public License v3.0
146 stars 40 forks source link

Better Error Reporting #232

Open harvindhillon opened 11 months ago

harvindhillon commented 11 months ago

Possibility to increase detail in ERRORs to include NWK and IEEE of device. Example below for Device Didnt Respond ERROR

2023-11-06 11:32:37.941 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 314, in request
    return await req.result
           ^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/util.py", line 138, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 313, in request
    async with asyncio_timeout(timeout):
  File "/usr/local/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1578, in async_run
    return await asyncio.shield(run.async_run())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 420, in async_run
    await self._async_step(log_exceptions=False)
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 470, in _async_step
    self._handle_exception(
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 493, in _handle_exception
    raise exception
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 468, in _async_step
    await getattr(self, handler)()
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 704, in _async_call_service_step
    response_data = await self._async_run_long_action(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 666, in _async_run_long_action
    return long_task.result()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2035, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2072, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 235, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 876, in entity_service_call
    response_data = await _handle_entity_call(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 600, in async_handle_toggle_service
    await async_handle_light_on_service(light, call)
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
    await light.async_turn_on(**filter_turn_on_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 336, in async_turn_on
    result = await self._level_cluster_handler.move_to_level_with_on_off(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 66, in wrap_zigpy_exceptions
    raise HomeAssistantError(
homeassistant.exceptions.HomeAssistantError: Failed to send request: device did not respond

I do not have any information which device this is?

There are other examples where just NWK is present

2023-11-06 07:26:31.219 WARNING (MainThread) [zigpy.application] Received relays from an unknown device: 0x6186
2023-11-06 07:26:31.572 WARNING (MainThread) [zigpy.application] Unknown device AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x6186)

Without IEEE since this is an unknown relay, the NWK is unregistered, I havent a clue which device is causing this error to be thrown

MattWestb commented 11 months ago

I agree but i dont knowing the coding things.

The unknown relay shall also being good to getting the IEEE so can see which device is it that ghosting around. Normally is it one sleeper that have being deleted in the system but was not getting the leave and not rejoining command so its have the network key and can communicating with the network or have migrating / importing the network its one device the system never have committing with. If have suspected device re power it so its doing one device accouterments and you is getting the IEEE and if the system is liking it its initiating it OK but can needing the network open for joining.

harvindhillon commented 11 months ago

I suspect it was due to me moving my channel, might be some of the relays didnt migrate properly. I dont know which it is, I have around 65 routers and 150 devices overall, so its a pain to pin point which ones are the culprit. Everything seems to be working though, so it makes me more curious.

MattWestb commented 11 months ago

I think best is installing the https://github.com/mdeweerd/zha-network-card then you can see the lat seen and sorting and you can see if some devices is offline and can looking at them.

harvindhillon commented 11 months ago

Actually prefer ZHA Toolkit, we can actually dump it in a csv file from service command :) service: zha_toolkit.zha_devices data: csvlabel: ieee csvout: /config/csv/devices.csv

MattWestb commented 10 months ago

I is getting it now.

Logger: zigpy.application
Source: runner.py:188
First occurred: 15:58:53 (1 occurrences)
Last logged: 15:58:53

Unknown device AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x6FB6)

And i have debug activated and is getting :

2023-11-26 22:12:02.652 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received trustCenterJoinHandler: [0x6fb6, 00:17:88:01:04:a6:13:37, <EmberDeviceUpdate.STANDARD_SECURITY_UNSECURED_JOIN: 1>, <EmberJoinDecision.USE_PRECONFIGURED_KEY: 0>, 0xa983]
2023-11-26 22:12:02.653 DEBUG (MainThread) [bellows.zigbee.application] Received trustCenterJoinHandler frame with [0x6fb6, 00:17:88:01:04:a6:13:37, <EmberDeviceUpdate.STANDARD_SECURITY_UNSECURED_JOIN: 1>, <EmberJoinDecision.USE_PRECONFIGURED_KEY: 0>, 0xa983]
2023-11-26 22:12:02.653 INFO (MainThread) [zigpy.application] New device 0x6fb6 (00:17:88:01:04:a6:13:37) joined the network

And the IEEE is one old fried (Philip HUE light from one of my neighbors) that is trying joining late in the night after it was joining the network one time and i have trying deleting it so its loosing the network key. I was getting it working for nearly one year but was changing batteries in some remotes for 3 days ago and then it was joining the network agen so now i must sitting up after 22:00 for sending one leave without rejoining to it for getting it mot being in my production network