home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
69.75k stars 28.91k forks source link

ZHA delays with Tuya mmWave sensors #107200

Open a-d-r-i-a-n-d opened 6 months ago

a-d-r-i-a-n-d commented 6 months ago

The problem

Since I've upgraded to 2024.1.0 my ZigBee devices (lights, switches, fans etc) are having a few seconds/minutes delay in responding to commands. I have tried to restore to previous versions but the issue still persist, no matter the version I am restoring to. I have also tried restarting in safe mode, but it didn't helped. Any help is much appreciated!

What version of Home Assistant Core has the issue?

core-2024.1.0

What was the last working version of Home Assistant Core?

core-2023.12.3

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

config_entry-zha-1122b1901b65684e9e9a05c89bd2a54d.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Logger: homeassistant.components.zha.core.cluster_handlers
Source: components/zha/core/cluster_handlers/__init__.py:537
Integration: Zigbee Home Automation (documentation, issues)
First occurred: 23:36:04 (7 occurrences)
Last logged: 23:40:47

[0xEA69:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0xEA69:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x85AF:1:0x0702]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x85AF:1:0x0006]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]
[0x85AF:1:0x0b04]: async_initialize: all attempts have failed: [DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>'), DeliveryError('Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>')]

Additional information

System Information

version core-2024.1.0
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.63-haos
arch x86_64
timezone Europe/London
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4955 Installed Version | 1.33.0 Stage | running Available Repositories | 1371 Downloaded Repositories | 7 HACS Data | ok
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 22 January 2024 at 00:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | false remote_server | eu-central-1-9.ui.nabu.casa certificate_status | ready instance_id | 8bf80b83acdc431081381385ed2692e2 can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.2 -- | -- update_channel | stable supervisor_version | supervisor-2023.12.1 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 58.0 GB disk_used | 12.2 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | EMQX (0.3.0), Samba share (12.2.0), Studio Code Server (5.14.2), openWakeWord (1.8.2), Advanced SSH & Web Terminal (17.0.1)
Dashboards dashboards | 2 -- | -- resources | 1 views | 12 mode | storage
MercedesME 2020 websocket_connection_state | connected -- | -- api_endpoint_reachable | ok
Recorder oldest_recorder_run | 21 December 2023 at 17:55 -- | -- current_recorder_run | 4 January 2024 at 23:22 estimated_db_size | 308.77 MiB database_engine | sqlite database_version | 3.41.2
puddly commented 5 months ago

@frank777777 Can you post debug logs from 2023.12 as well as 2024.1.4?

nerdyninny commented 5 months ago

IMG_4755

Popping in here just to say that I too noticed that I’ve been experiencing the same lags/delays when triggering a Zigbee device (e.g., thirdreality outlet acting as a router). Seems to have also started in 2024.01.x. On/off delays up to one minute.

Other sensors (door, temperature, etc.) appear to be working normal. It seems to affect Zigbee routers specifically.

I also noticed that (as shown in attached screenshot), the reporting history for the battery only sensors (not routers) won’t render like it normally does. It otherwise works fine, and no delays afaik.

frank777777 commented 5 months ago

@puddly Very Good morning, could you please share a link with a step by step instruction on how to generate the debug logs of ZHA ( I presume is the register bugs icon on ZHA, and then download .. am Iright?)

frank777777 commented 5 months ago

IMG_4755

Popping in here just to say that I too noticed that I’ve been experiencing the same lags/delays when triggering a Zigbee device (e.g., thirdreality outlet acting as a router). Seems to have also started in 2024.01.x. On/off delays up to one minute.

Other sensors (door, temperature, etc.) appear to be working normal. It seems to affect Zigbee routers specifically.

I also noticed that (as shown in attached screenshot), the reporting history for the routers won’t render like it normally does.

Wondering if the new 2024.01.05 is fixing this

thegreatdane6 commented 5 months ago

2024.01.05 does not fix this. I am also having the described issue, so far only 2023.12.3 "fixes" it.

thegreatdane6 commented 5 months ago

Same here again after upgrading from 2023.12.3 to 2024.1.1 and .2. All Zigbee devices respond slow.

logging: IMG_8798

The async error does not seem to be related to the issue since this also happens in 2023.12.3 on my side. 2023.12.3 works like a charm. I will provide my 2024.1.5 debug logs later on. I have to do this when my girlfrind is not aware since it affects basically every instance of our smarthome and nothing will work flawlessly anymore :(

frank777777 commented 5 months ago

@thegreatdane6 totally understand - same here . my wife joke me when things are not working anymore , so I can't upgrade to error now . thanks greatdane for your effort -

nerdyninny commented 5 months ago

Another symptom. Trying to add another Zigbee sensor fails, with a message I don’t typically see. IMG_4757

I reverted to 2023.12.04. Works again and snappy response.

holgrrr commented 5 months ago

Hi all,

I am also affected by the issue. Am currently on 2024.1.5, to which I upgraded last night from 2024.1.2 - without seeing any improvements.

My current state ("the mess") is

Until early on this year the overall state was reliable, with one or the other sensor dropping from time to time, but apart from that solid (including the other mmwave sensor currently having issues). "The mess" did not occur right after the upgrade (from 2023.12.4 to 2024.1.1) but rather gradually (or at least "over night") and is something I am dealing with for around 1,5w now, rendering my zigbee-based lights and plugs basically unusable.

Following the discussion here, I rolled back to 2023.12.1 for a short period for testing purposes. This improved the situation - not seeing any severe delays in switching lights or any unusual behaviour anymore. However (at least for now) I can't go back to the 2013.12.x permanently, as I noticed that I had done some configurations (dashboards, adding of some additional sensors (just 3 door/window sensors + 2 remotes - same models already active prior to "the mess"), related automations and another integration) since upgrading to 2014.1.x, stuff I would need to backup/export somehow first...

I collected the following log files:

a) home-assistant_zha_2024-01-22T19-01-30.973Z.log Log since upgrading to 2014.1.5, containing several remote button presses and at least one (failing) attempt to scan for a new zigbee device. home-assistant_zha_2024-01-22T19-01-30.973Z.log

b) home-assistant_zha_2024-01-22T19-40-13.664Z.log Log with the short testing period of the old 2023.12.1 of a couple of minutes, right after reverting to 2023.12.1 home-assistant_zha_2024-01-22T19-40-13.664Z.log

Hope we can get this sorted out.

Kind regards, Holger

PS: worst of the "delaying request (X enqueued). Previously delayed request is now running, delayed by Ys" I found in some older log files I created on the 2014.1.2 release was X=109909 (enqueued requests) and Y=16763 (s delay). Still have these files in case needed.

frank777777 commented 5 months ago

Exactly the same

dmulcahey commented 5 months ago

For anyone with Tuya MM wave sensors: please try removing them and un plug them. Give it some time and see if anything gets better. We are seeing some models of these absolutely hammer meshes and cause significant issues.

holgrrr commented 5 months ago

Will do so over night and report back.

Read reports on them flooding the netwok too. However, I find it a bit hard to believe that the two mmwave sensors on my network should suddenly cause such severe problems basically over night. Integrated them in AUG+SEPT and had them working stable since then.

Out of curiosity: What's the best way to see how much traffic they are actually causing, especially as they are also functioning as router? I performed a naive grep -i for the NWKs (0x7D29 and 0xcaff - my inner nerd loves this last one btw :D) of my two sensors in all log files I collected recently. That shows that they add up for around 5% of the total log entries. Might not be the most meaningful approach though.

holgrrr commented 5 months ago

As promised the 12+h later update:

Will now perform a reboot of the HAOS system as the mmwave sensors were not shown as unavailable and check a couple of hours later if this had some impact.

dmulcahey commented 5 months ago

As promised the 12+h later update:

  • Tuya mmwave sensors are unplugged for about 13h now
  • no change about "the mess" - light bulb and plugs not responding (gave up monitoring them after several minutes), sensor reporting current values with a delay of some seconds.

Will now perform a reboot of the HAOS system as the mmwave sensors were not shown as unavailable and check a couple of hours later if this had some impact.

Please enable debug logging after the reboot and post the log after your testing.

holgrrr commented 5 months ago

Performed yet another reboot to enable debug logging early on.

For now (5min after the reboot) all working so far. Tested turning on a bunch of bulbs via remote and HA webinterface. home-assistant_zha_2024-01-23T12-58-40.276Z.log

Will test again in a couple of hours and see if it changed.

frank777777 commented 5 months ago

Thanks for the work ! I presume we found the bad guys .. wonder why .. they works before ..

holgrrr commented 5 months ago

You're welcome :) But let's wait and see if this really did the trick.

1,5h after the reboot it's still working stable, with "Previously delayed request is now running, delayed by" between 0.0X and 5sec, the later mostly occuring when turning on the lights in a room with 5+ zigbee bulbs.

However, I personally cannot work out why the mmwave sensors did not cause any problems on the 2023.12.X release as opposed to the 2014.1.X. If these are the cause. Is there a more sensible procedure for determining their load on the network than the one I described above?

dmulcahey commented 5 months ago

You're welcome :) But let's wait and see if this really did the trick.

1,5h after the reboot it's still working stable, with "Previously delayed request is now running, delayed by" between 0.0X and 5sec, the later mostly occuring when turning on the lights in a room with 5+ zigbee bulbs.

However, I personally cannot work out why the mmwave sensors did not cause any problems on the 2023.12.X release as opposed to the 2014.1.X. If these are the cause. Is there a more sensible procedure for determining their load on the network than the one I described above?

We had other bugs that impacted devices that behave. The bug(s) caused messages to not be handled correctly. We fixed those issues and this issue became visible.

marc4s commented 5 months ago

I do not use an mmwave device. So from my point of view it is not related to them.

I use PIR, Temp/humidity, plugs, bulbs and switches

holgrrr commented 5 months ago

The afternoon and early evening tests with the mmwave sensors off was rather reliable. However in the evening I have seen an increase of up to ~550 enqueued requests with delays up to 90s.

This then returned to a rather instant behaviour, but peaked again after some time. home-assistant_zha_2024-01-23T20-14-16.889Z.zip

So I sort of have the feeling, that the mmwave sensors might have amplified another, still existing problem.

puddly commented 5 months ago

@marc4s Please upload debug logs for the ZHA integration in addition to diagnostics JSON.

tjerkw commented 5 months ago

Similar issues here. Network is stable initially but starts degrading over time. Not sure if its the same issue. I've seen multiple ZHA issues opened here.

Multiprotocol is not enabled. No mmwave sensor AFAIK

I think this is the issue:

2024-01-25 21:48:58.965 DEBUG (MainThread) [zigpy.application] Max concurrency (8) reached, delaying request (13761 enqueued)

Funny thing, only 2 minutes later:

024-01-25 21:50:56.115 DEBUG (MainThread) [zigpy.application] Max concurrency (8) reached, delaying request (13916 enqueued)

config_entry-zha


{
  "home_assistant": {
    "installation_type": "Home Assistant OS",
    "version": "2024.1.5",
    "dev": false,
    "hassio": true,
    "virtualenv": false,
    "python_version": "3.11.6",
    "docker": true,
    "arch": "aarch64",
    "timezone": "Europe/Amsterdam",
    "os_name": "Linux",
    "os_version": "6.1.63-haos-raspi",
    "supervisor": "2023.12.1",
    "host_os": "Home Assistant OS 11.4",
    "docker_version": "24.0.7",
    "chassis": "embedded",
    "run_as_root": true
  },
  "custom_components": {},
  "integration_manifest": {
    "domain": "zha",
    "name": "Zigbee Home Automation",
    "after_dependencies": [
      "onboarding",
      "usb"
    ],
    "codeowners": [
      "@dmulcahey",
      "@adminiuga",
      "@puddly",
      "@TheJulianJES"
    ],
    "config_flow": true,
    "dependencies": [
      "file_upload"
    ],
    "documentation": "https://www.home-assistant.io/integrations/zha",
    "iot_class": "local_polling",
    "loggers": [
      "aiosqlite",
      "bellows",
      "crccheck",
      "pure_pcapy3",
      "zhaquirks",
      "zigpy",
      "zigpy_deconz",
      "zigpy_xbee",
      "zigpy_zigate",
      "zigpy_znp",
      "universal_silabs_flasher"
    ],
    "requirements": [
      "bellows==0.37.6",
      "pyserial==3.5",
      "pyserial-asyncio==0.6",
      "zha-quirks==0.0.109",
      "zigpy-deconz==0.22.4",
      "zigpy==0.60.6",
      "zigpy-xbee==0.20.1",
      "zigpy-zigate==0.12.0",
      "zigpy-znp==0.12.1",
      "universal-silabs-flasher==0.0.15",
      "pyserial-asyncio-fast==0.11"
    ],
    "usb": [
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*2652*",
        "known_devices": [
          "slae.sh cc2652rb stick"
        ]
      },
      {
        "vid": "1A86",
        "pid": "55D4",
        "description": "*sonoff*plus*",
        "known_devices": [
          "sonoff zigbee dongle plus v2"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*sonoff*plus*",
        "known_devices": [
          "sonoff zigbee dongle plus"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*tubeszb*",
        "known_devices": [
          "TubesZB Coordinator"
        ]
      },
      {
        "vid": "1A86",
        "pid": "7523",
        "description": "*tubeszb*",
        "known_devices": [
          "TubesZB Coordinator"
        ]
      },
      {
        "vid": "1A86",
        "pid": "7523",
        "description": "*zigstar*",
        "known_devices": [
          "ZigStar Coordinators"
        ]
      },
      {
        "vid": "1CF1",
        "pid": "0030",
        "description": "*conbee*",
        "known_devices": [
          "Conbee II"
        ]
      },
      {
        "vid": "0403",
        "pid": "6015",
        "description": "*conbee*",
        "known_devices": [
          "Conbee III"
        ]
      },
      {
        "vid": "10C4",
        "pid": "8A2A",
        "description": "*zigbee*",
        "known_devices": [
          "Nortek HUSBZB-1"
        ]
      },
      {
        "vid": "0403",
        "pid": "6015",
        "description": "*zigate*",
        "known_devices": [
          "ZiGate+"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*zigate*",
        "known_devices": [
          "ZiGate"
        ]
      },
      {
        "vid": "10C4",
        "pid": "8B34",
        "description": "*bv 2010/10*",
        "known_devices": [
          "Bitron Video AV2010/10"
        ]
      }
    ],
    "zeroconf": [
      {
        "type": "_esphomelib._tcp.local.",
        "name": "tube*"
      },
      {
        "type": "_zigate-zigbee-gateway._tcp.local.",
        "name": "*zigate*"
      },
      {
        "type": "_zigstar_gw._tcp.local.",
        "name": "*zigstar*"
      },
      {
        "type": "_uzg-01._tcp.local.",
        "name": "uzg-01*"
      },
      {
        "type": "_slzb-06._tcp.local.",
        "name": "slzb-06*"
      }
    ],
    "is_built_in": true
  },
  "data": {
    "config": {},
    "config_entry": {
      "entry_id": "cbe1f5d79a7cd3aa470251ee6ea6a0ce",
      "version": 4,
      "minor_version": 1,
      "domain": "zha",
      "title": "/dev/ttyAMA1",
      "data": {
        "device": {
          "path": "/dev/ttyAMA1",
          "baudrate": 115200,
          "flow_control": "software"
        },
        "radio_type": "ezsp"
      },
      "options": {
        "custom_configuration": {
          "zha_options": {
            "enhanced_light_transition": true,
            "enable_identify_on_join": false,
            "default_light_transition": 0,
            "light_transitioning_flag": true,
            "always_prefer_xy_color_mode": true,
            "group_members_assume_state": true,
            "consider_unavailable_mains": 7200,
            "consider_unavailable_battery": 21600
          }
        }
      },
      "pref_disable_new_entities": false,
      "pref_disable_polling": false,
      "source": "user",
      "unique_id": null,
      "disabled_by": null
    },
    "application_state": {
      "node_info": {
        "nwk": 0,
        "ieee": "**REDACTED**",
        "logical_type": 0,
        "model": "Yellow v1.2",
        "manufacturer": "Nabu Casa",
        "version": "6.10.3.0 build 297"
      },
      "network_info": {
        "extended_pan_id": "**REDACTED**",
        "pan_id": 12339,
        "nwk_update_id": 0,
        "nwk_manager_id": 0,
        "channel": 15,
        "channel_mask": 134215680,
        "security_level": 5,
        "network_key": "**REDACTED**",
        "tc_link_key": {
          "key": [
            90,
            105,
            103,
            66,
            101,
            101,
            65,
            108,
            108,
            105,
            97,
            110,
            99,
            101,
            48,
            57
          ],
          "tx_counter": 0,
          "rx_counter": 0,
          "seq": 0,
          "partner_ieee": "**REDACTED**"
        },
        "key_table": [],
        "children": [],
        "nwk_addresses": {},
        "stack_specific": {
          "ezsp": {
            "hashed_tclk": "c79dd5f9adf3f747f4ac7960d3613354"
          }
        },
        "metadata": {
          "ezsp": {
            "stack_version": 8,
            "can_burn_userdata_custom_eui64": true,
            "can_rewrite_custom_eui64": false
          }
        },
        "source": "bellows@0.37.6"
      },
      "counters": {
        "ezsp_counters": {
          "MAC_RX_BROADCAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_RX_BROADCAST', _raw_value=65535, reset_count=35, _last_reset_value=2228190)"
          },
          "MAC_TX_BROADCAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_TX_BROADCAST', _raw_value=17462, reset_count=35, _last_reset_value=1382531)"
          },
          "MAC_RX_UNICAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_RX_UNICAST', _raw_value=19681, reset_count=35, _last_reset_value=1503648)"
          },
          "MAC_TX_UNICAST_SUCCESS": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_TX_UNICAST_SUCCESS', _raw_value=8694, reset_count=35, _last_reset_value=584458)"
          },
          "MAC_TX_UNICAST_RETRY": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_TX_UNICAST_RETRY', _raw_value=4413, reset_count=35, _last_reset_value=352196)"
          },
          "MAC_TX_UNICAST_FAILED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='MAC_TX_UNICAST_FAILED', _raw_value=923, reset_count=35, _last_reset_value=77868)"
          },
          "APS_DATA_RX_BROADCAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_RX_BROADCAST', _raw_value=444, reset_count=35, _last_reset_value=36917)"
          },
          "APS_DATA_TX_BROADCAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_TX_BROADCAST', _raw_value=389, reset_count=35, _last_reset_value=33434)"
          },
          "APS_DATA_RX_UNICAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_RX_UNICAST', _raw_value=6361, reset_count=35, _last_reset_value=478426)"
          },
          "APS_DATA_TX_UNICAST_SUCCESS": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_TX_UNICAST_SUCCESS', _raw_value=3155, reset_count=35, _last_reset_value=276114)"
          },
          "APS_DATA_TX_UNICAST_RETRY": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_TX_UNICAST_RETRY', _raw_value=2307, reset_count=35, _last_reset_value=159654)"
          },
          "APS_DATA_TX_UNICAST_FAILED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DATA_TX_UNICAST_FAILED', _raw_value=762, reset_count=35, _last_reset_value=29894)"
          },
          "ROUTE_DISCOVERY_INITIATED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ROUTE_DISCOVERY_INITIATED', _raw_value=600, reset_count=35, _last_reset_value=41659)"
          },
          "NEIGHBOR_ADDED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='NEIGHBOR_ADDED', _raw_value=0, reset_count=35, _last_reset_value=27)"
          },
          "NEIGHBOR_REMOVED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='NEIGHBOR_REMOVED', _raw_value=0, reset_count=35, _last_reset_value=2)"
          },
          "NEIGHBOR_STALE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='NEIGHBOR_STALE', _raw_value=176, reset_count=35, _last_reset_value=11916)"
          },
          "JOIN_INDICATION": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='JOIN_INDICATION', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "CHILD_REMOVED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='CHILD_REMOVED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "ASH_OVERFLOW_ERROR": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ASH_OVERFLOW_ERROR', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "ASH_FRAMING_ERROR": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ASH_FRAMING_ERROR', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "ASH_OVERRUN_ERROR": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ASH_OVERRUN_ERROR', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "NWK_FRAME_COUNTER_FAILURE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='NWK_FRAME_COUNTER_FAILURE', _raw_value=240, reset_count=35, _last_reset_value=21038)"
          },
          "APS_FRAME_COUNTER_FAILURE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_FRAME_COUNTER_FAILURE', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "UTILITY": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='UTILITY', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "APS_LINK_KEY_NOT_AUTHORIZED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_LINK_KEY_NOT_AUTHORIZED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "NWK_DECRYPTION_FAILURE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='NWK_DECRYPTION_FAILURE', _raw_value=0, reset_count=35, _last_reset_value=9)"
          },
          "APS_DECRYPTION_FAILURE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='APS_DECRYPTION_FAILURE', _raw_value=8, reset_count=35, _last_reset_value=286)"
          },
          "ALLOCATE_PACKET_BUFFER_FAILURE": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ALLOCATE_PACKET_BUFFER_FAILURE', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "RELAYED_UNICAST": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='RELAYED_UNICAST', _raw_value=1, reset_count=35, _last_reset_value=168)"
          },
          "PHY_TO_MAC_QUEUE_LIMIT_REACHED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PHY_TO_MAC_QUEUE_LIMIT_REACHED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PACKET_VALIDATE_LIBRARY_DROPPED_COUNT": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PACKET_VALIDATE_LIBRARY_DROPPED_COUNT', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "TYPE_NWK_RETRY_OVERFLOW": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='TYPE_NWK_RETRY_OVERFLOW', _raw_value=2015, reset_count=35, _last_reset_value=199696)"
          },
          "PHY_CCA_FAIL_COUNT": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PHY_CCA_FAIL_COUNT', _raw_value=5955, reset_count=35, _last_reset_value=384090)"
          },
          "BROADCAST_TABLE_FULL": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='BROADCAST_TABLE_FULL', _raw_value=65535, reset_count=35, _last_reset_value=2228190)"
          },
          "PTA_LO_PRI_REQUESTED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_LO_PRI_REQUESTED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PTA_HI_PRI_REQUESTED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_HI_PRI_REQUESTED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PTA_LO_PRI_DENIED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_LO_PRI_DENIED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PTA_HI_PRI_DENIED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_HI_PRI_DENIED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PTA_LO_PRI_TX_ABORTED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_LO_PRI_TX_ABORTED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "PTA_HI_PRI_TX_ABORTED": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='PTA_HI_PRI_TX_ABORTED', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "ADDRESS_CONFLICT_SENT": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='ADDRESS_CONFLICT_SENT', _raw_value=0, reset_count=35, _last_reset_value=0)"
          },
          "EZSP_FREE_BUFFERS": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='EZSP_FREE_BUFFERS', _raw_value=98, reset_count=35, _last_reset_value=0)"
          }
        },
        "controller_app_counters": {
          "watchdog_reset_requests": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='watchdog_reset_requests', _raw_value=6194, reset_count=1, _last_reset_value=1)"
          },
          "unicast_rx": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_rx', _raw_value=485040, reset_count=0, _last_reset_value=0)"
          },
          "broadcast_rx": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='broadcast_rx', _raw_value=37334, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_success": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_success', _raw_value=279276, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_failure": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_failure', _raw_value=30656, reset_count=0, _last_reset_value=0)"
          },
          "broadcast_tx_failure_unexpected": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='broadcast_tx_failure_unexpected', _raw_value=294, reset_count=0, _last_reset_value=0)"
          },
          "broadcast_tx_success_unexpected": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='broadcast_tx_success_unexpected', _raw_value=33541, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_success_duplicate": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_success_duplicate', _raw_value=31, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_success_unexpected": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_success_unexpected', _raw_value=52, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_failure_unexpected": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_failure_unexpected', _raw_value=14, reset_count=0, _last_reset_value=0)"
          },
          "nwk_conflicts": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='nwk_conflicts', _raw_value=2, reset_count=0, _last_reset_value=0)"
          },
          "unicast_tx_failure_duplicate": {
            "__type": "<class 'zigpy.state.Counter'>",
            "repr": "Counter(name='unicast_tx_failure_duplicate', _raw_value=9, reset_count=0, _last_reset_value=0)"
          }
        }
      },
      "broadcast_counters": {},
      "device_counters": {},
      "group_counters": {}
    },
    "energy_scan": {
      "11": 80.38447947821754,
      "12": 88.70042934643088,
      "13": 89.93931580241996,
      "14": 73.50699819621309,
      "15": 97.39286236923465,
      "16": 97.97769123383605,
      "17": 97.39286236923465,
      "18": 98.62178092672917,
      "19": 65.26028270288712,
      "20": 23.33483723001185,
      "21": 31.01324838787301,
      "22": 28.30261646762903,
      "23": 13.711043742539033,
      "24": 15.32285793082191,
      "25": 21.09014924761344,
      "26": 97.97769123383605
    },
    "versions": {
      "bellows": "0.37.6",
      "zigpy": "0.60.6",
      "zigpy_deconz": "0.22.4",
      "zigpy_xbee": "0.20.1",
      "zigpy_znp": "0.12.1",
      "zigpy_zigate": "0.12.0",
      "zhaquirks": "0.0.109"
    }
  }
}
``
puddly commented 5 months ago

@tjerkw Can you include a startup debug log? Leave it running for about half an hour.

nerdyninny commented 5 months ago

Update. As I run HA on a VM, I reverted to a working snapshot (OS 11.2, Core 2023.12.4). All Zigbee ran like normal, no delays. I let it run a few days and confirmed things were stable, did another snapshot again for good measure, and then did the upgrade to OS 11.4 and Core 2024.1.5. Notably different is that I did not incrementally update from 2023.12.4 -> 2024.01.1-4 as I did before, rather just 2023.12.4 -> 2024.1.5.

Interestingly, I am not encountering any delays this time. It’s been a day since I did this. This isn’t the first time this technique “fixed” a Zigbee problem for me.

Hope this helps someone else.

marc4s commented 5 months ago

I have the same behaviour like nerdyninny currently. 2024.1.5 seems to has it fixed if directly updated from 2023.12.4 (but previous tests was also directly to the latest stable. as my last test was from 2023.12.4 to 2024.1.3 which did not resolved this issue

directly after start there was still a little delay of around 3-4 seconds. but after around 5minutes no delay anymore.

frank777777 commented 5 months ago

thanks to @marc4s and @nerdyninny to report this, I'll do my upgrade now to test too.. if it works would be bizarre as in changelog there are no ZHA , nor related zigbee change . let' see

marc4s commented 5 months ago

RAM usage increased from around 2gb to around 4,5gb since (debug already disabled). but this is the only disadvantage till now

marc4s commented 5 months ago

rebooted. RAM usage back at 2gb. and delay is also back. debug log will follow soon

marc4s commented 5 months ago

https://we.tl/t-CyNLbDFJUe

200mb was to big for github, so i used wetransfer

marc4s commented 5 months ago

2024-01-26 09:06:35.538 DEBUG (MainThread) [zigpy.application] Max concurrency (16) reached, delaying request (1978 enqueued) 2024-01-26 09:25:40.793 DEBUG (MainThread) [zigpy.application] Max concurrency (16) reached, delaying request (7796 enqueued) this is increasing all the time

ferdyvi commented 5 months ago

Hi all. I too noticed lag and delays in last HA versions (I don't remember which one was the first). I have zigbee radiator valves and they reported regularly their status, but I'm not able to change the temperature target anymore. I repaired them but the issue persists. So, I've tried again to repair them directly to the coordinator, bypassing the router in between and at the moment valves seems to work properly as before. However, I will continue to check in the next days, but in the meantime perhaps someone else could test the same change bypassing the router

dmulcahey commented 5 months ago

Hi all. I too noticed lag and delays in last HA versions (I don't remember which one was the first). I have zigbee radiator valves and they reported regularly their status, but I'm not able to change the temperature target anymore. I repaired them but the issue persists. So, I've tried again to repair them directly to the coordinator, bypassing the router in between and at the moment valves seems to work properly as before. However, I will continue to check in the next days, but in the meantime perhaps someone else could test the same change bypassing the router

What was it paired to previously? Manufacturer and model please

ferdyvi commented 5 months ago

Radiator valves Danfoss eTRV0103 paired with Xbee S2C PRO configured as router; Coordinator: usb Conbee II

nerdyninny commented 5 months ago

Reporting in. ZHA still working since upgrade from 2023.12.4 to 2024.1.5.

marc4s commented 5 months ago

@nerdyninny - which device are you using? I am using Sonoff USB Dongle P with pre-installed firmware. I think about updateing the firmware

frank777777 commented 5 months ago

@nerdyninny , Hi , did you reboot?, as @marc4s had same results, but after rebooting he go the same delay

holgrrr commented 5 months ago

Some more updates from my side - am quite confused what's going on...

a) Setup was more or less stable last week - with the delayed time growing sometimes throughout the day. b) Repaired 2-3 sensors that had become unresponsive throughout the week AND added a single new Sonoff SNZB-06P yesterday (30.01.2024 around 2pm) for testing purposes as replacement for the Tuya mmwave sensors. However had stability issues with the sensor (quickly became unavailable) but let it sit plugged anyway. Over night (around after 3am) the whole setup went fubar again, leaving me unable to turn on/off the zigbee bulbs.

-> this is covered in home-assistant_zha_2024-01-31T10-30-40.442Z.zip home-assistant_zha_2024-01-31T10-30-40.442Z.zip

Fearing this might be related to the new sonoff mmwave, I unplugged it (and left it that way) and rebooted (without improvement).

Seeing there was an update to 2024.1.6 I tried to upgrade to this version, as the changelog promised a bump in the zigpy dependency. However, the update seemed to not work out, being stuck at starting ZHA and then automatically rolling back to the 2024.1.6 backup. (That's at least the impression it made to me - could only follow this during some very short breaks.)

At the moment I am again on 2024.1.5 - have rebooted twice and am still experiencing very high delays, constantly increasing, according to a simple grep -i "Previously delayed request is now running, delayed by" home-assistant_zha_2024-01-31T12-41-13.192Z.log home-assistant_zha_2024-01-31T12-41-13.192Z.zip

netsoft-ruidias commented 5 months ago

Hello,

I'm experiencing the same problem, very unresponsive ZigBee network for about two weeks. But only in one direction, that is, when I give instructions to turn on/off the lights, for example. Status updates continue to work instantly.

ie: If I turn on a light on the physical switch, I immediately see the status on the HA. Motion sensors also report status instantly If I want to turn on the light via HA, it takes a few minutes to execute,

I have about 57 zigbee devices, 12 of them are mmWave.

I have a couple of these in my LOG:

2024-01-31 14:21:59.867 DEBUG (MainThread) [zigpy.application] Max concurrency (16) reached, delaying request (419 enqueued)
...
2024-01-31 14:21:59.815 DEBUG (MainThread) [zigpy.application] Previously delayed request is now running, delayed by 15.73s
...
2024-01-31 14:21:59.926 DEBUG (MainThread) [zigpy_znp.zigbee.application] Request failed (Unsuccessful request status code: <Status.MAC_CHANNEL_ACCESS_FAILURE: 225>), retry attempt 4 of 5 (RouteDiscovery|LastGoodRoute)

After some additional research, I realized this has happened in the past: https://github.com/home-assistant/core/issues/78163 (I hope it helps to solve the problem)

puddly commented 5 months ago

MAC_CHANNEL_ACCESS_FAILURE means that the radio is refusing to send requests because you're experiencing extreme amounts of interference. Make sure your coordinator is on a USB extension cable and positioned away from interference sources such as USB 3.0 ports, WiFi APs, routers, SSDs, etc.

I have about 57 zigbee devices, 12 of them are mmWave.

Did they work in the past? If so, what version? Can you downgrade HA Core and create a debug log with that version as well?

frank777777 commented 5 months ago

We’ve already tested the extension cable .. nothing changed ..

puddly commented 5 months ago

@frank777777 You and @netsoft-ruidias are different people 😄. MAC_CHANNEL_ACCESS_FAILURE is a hard failure reported by the radio firmware. No way around it.

As mentioned above, upload debug logs from both versions! Until there's enough information to track this issue down, nothing will be fixed. I can't buy a dozen Tuya mmWave devices just to test this one issue.

netsoft-ruidias commented 5 months ago

@puddly

MAC_CHANNEL_ACCESS_FAILURE means that the radio is refusing to send requests because you're experiencing extreme amounts of interference. Make sure your coordinator is on a USB extension cable and positioned away from interference sources such as USB 3.0 ports, WiFi APs, routers, SSDs, etc.

I can do this, but be aware that nothing has changed, the coordinator has been in the same place for a year (since day one), and nothing has been added, nothing has been removed.

Did they work in the past? If so, what version?

Yes, of course yes, it has always worked until now. I update to the latest version every month, we can say that it worked perfectly until version 2013.12 and stopped working in version 2014.01.

Can you downgrade HA Core and create a debug log with that version as well?

It will be more complicated, but I'll try to do it tonight, I'll keep you posted.

BTW: Do you know anything about the issue from 2022? that I referencied here https://github.com/home-assistant/core/issues/78163?

Thanks

ferdyvi commented 5 months ago

Hello,

I'm experiencing the same problem, very unresponsive ZigBee network for about two weeks. But only in one direction, that is, when I give instructions to turn on/off the lights, for example. Status updates continue to work instantly.

ie: If I turn on a light on the physical switch, I immediately see the status on the HA. Motion sensors also report status instantly If I want to turn on the light via HA, it takes a few minutes to execute,

I have about 57 zigbee devices, 12 of them are mmWave.

I have a couple of these in my LOG:

2024-01-31 14:21:59.867 DEBUG (MainThread) [zigpy.application] Max concurrency (16) reached, delaying request (419 enqueued)
...
2024-01-31 14:21:59.815 DEBUG (MainThread) [zigpy.application] Previously delayed request is now running, delayed by 15.73s
...
2024-01-31 14:21:59.926 DEBUG (MainThread) [zigpy_znp.zigbee.application] Request failed (Unsuccessful request status code: <Status.MAC_CHANNEL_ACCESS_FAILURE: 225>), retry attempt 4 of 5 (RouteDiscovery|LastGoodRoute)

After some additional research, I realized this has happened in the past: #78163 (I hope it helps to solve the problem)

I confirm that the issue is only in one direction.... did you test if bypassing a router there is some improvement?

tjerkw commented 5 months ago

My ZHA is in failed state again (all Zigbee related devices disabled).. I found the initial log lines causing it.. Only a restart helps:

024-01-31 19:43:57.130 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2024-01-31 19:43:57.251 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.252 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.252 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.252 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.253 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.253 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.253 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.254 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.254 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.255 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.255 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.255 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.255 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.256 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.256 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.256 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:43:57.257 WARNING (MainThread) [homeassistant.helpers.dispatcher] Unable to remove unknown dispatcher <bound method GroupProbe._reprobe_group of <homeassistant.components.zha.core.discovery.GroupProbe object at 0x7f92abb510>>
2024-01-31 19:44:01.035 ERROR (MainThread) [zigpy.zcl] [0xCB6A:1:0x0b04] Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/zcl/__init__.py", line 411, in reply
    return await self._endpoint.reply(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/endpoint.py", line 278, in reply
    return await self.device.reply(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 484, in reply
    return await self.request(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/device.py", line 318, in request
    await send_request()
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 833, in request
    await self.send_packet(
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 868, in send_packet
    status, _ = await self._ezsp.sendUnicast(
                      ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'sendUnicast'

And then the NoneType' object has no attribute 'sendUnicast' forever

What is "NCP entered failed state."?

I'm using a Home Assistent Yellow, with multiprotocol disabled. HA Core 2024.1.5

disruptivepatternmaterial commented 5 months ago

@frank777777 You and @netsoft-ruidias are different people 😄. MAC_CHANNEL_ACCESS_FAILURE is a hard failure reported by the radio firmware. No way around it.

As mentioned above, upload debug logs from both versions! Until there's enough information to track this issue down, nothing will be fixed. I can't buy a dozen Tuya mmWave devices just to test this one issue.

Hey I have uploaded logs on this a few times (I think) so can you send along a link or something that tells me exactly what you might need? I think I reported this rounding in a month ago. I have mmWave devices, but the issue was there when they were on / in and off / out. I have the sky dongle, usb cable, etc... and things work PERFECTLY on versions from back in Dec. I'll send you a couple mmWave devices if that helps too :P hahaha but seriously the logs on HA are so absurdly loggy...what do I do to help hone in ZHA for us?

pgroene commented 5 months ago

I see the same type of issue, I have updated at the beginning of the month and retried with the current latest version and a new machine. Both cases it seems the work fine at first, than the delays start and than the controller goes offline, when reloading the stick it fixes the issue for some minutes and than it starts all over again. I have some logs if they would help.

home-assistant_zha_2024-02-01T00-53-42.272Z.log

Downgrading to 2023.12.3 resolves the issue.

ferdyvi commented 5 months ago

Hi all. I too noticed lag and delays in last HA versions (I don't remember which one was the first). I have zigbee radiator valves and they reported regularly their status, but I'm not able to change the temperature target anymore. I repaired them but the issue persists. So, I've tried again to repair them directly to the coordinator, bypassing the router in between and at the moment valves seems to work properly as before. However, I will continue to check in the next days, but in the meantime perhaps someone else could test the same change bypassing the router

Hi all. In my case, I confirm the issue is present only for end devices served by routers, instead of coordinator directly. Communication with them is working only in one direction: from end devices. So, "uplink" connection doesn't work crashing with a timeout error "Failed to send request: device did not respond" I also confirm that this problem was not present in december. Something seems to be changed in internal data handling

netsoft-ruidias commented 5 months ago

@puddly

Yesterday, to debug the interference problem, I did two experiments, I changed the site coordinator and I changed the zigbee channel from 15 (auto) to 25.

After a few minutes the network began to "heal" itself.

The network worked fine for a couple of hours before it gradually began to become congested and unresponsive again. This morning it is as it was yesterday, without responding to commands (the status update continues to work, apparently)

This is my log from today: home-assistant_zha_2024-02-01T11-39-46.127Z.log

Please note the number of Max concurrency (16) reached, delaying request (xxxx enqueued) entries... Where can we see which requests are enqueued? Is there a way to reset this?

Any help would be greatly appreciated, thank you

netsoft-ruidias commented 5 months ago

BTW: today I also downgraded HA to v2023.12.3. For now everything seems to be OK and responsive, however we need to wait a few hours to be sure (the system degrades over time) As soon as I have more information, I will let you know.

netsoft-ruidias commented 5 months ago

BTW: today I also downgraded HA to v2023.12.3. For now everything seems to be OK and responsive, however we need to wait a few hours to be sure (the system degrades over time) As soon as I have more information, I will let you know.

Status Update: It's been seven hours since I went back to version 2023.12 and everything seems to work spectacularly well, without any service degradation.

I believe once again it has been proven that the problem is in version 2024.01.

cc: @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as you are listed as a code owner? Thanks!