home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.53k stars 29.91k forks source link

TP-Link CS100 Door-Window Sensor Fails to Register IasZone Properly #76962

Closed ErikSGross closed 1 year ago

ErikSGross commented 2 years ago

The problem

With a ConBee II Zigbee stick and firmware 0x264a0700, my TP-Link CS100 Door/Window sensors have operated properly for over a year.

As of Home Assistant 2022.08.x, that ConBee II firmware generates a ZHA error of Writing network frame counter is not supported with this firmware, and none of my Zigbee devices respond to controls or send sensor updates. Thus a firmware update seems necessary.

I updated my ConBee II to firmware 0x26780700 (and had to rebuild my Zigbee network), and now all my devices EXCEPT the CS100 devices work properly. The CS100 devices should have IasZone and Battery entities. The entities are present after adoption, but the IasZone sensor never updates its value.

I've tried multiple CS100 devices as well as adding/removing them multiple times. I've tried on an Intel NUC running HAOS, and I've tried on a Home Assistant Blue with the same results.

After some further debugging to narrow down the reproduction steps, I've found: on a fresh build of HA 2022.08.5, I've found that ConBee II firmware up to (and including) 0x26580700 allows the CS100 to work properly (IasZone updates as sensor opened/closed). Using firmware 0x26650700 or later results in the IasZone not working.

The reason I'm filing a ZHA issue: once a Home Assistant instance has been used with a firmware of 0x26650700 or later, changing the ConBee II firmware back to an earlier version does not allow the CS100's IasZone sensors to work properly. The only way I've found to get the sensors to work again is to reimage the Home Assistant host or restore from a Home Assistant backup that has never used a ConBee II with a firmware of 0x26650700 or later.

What version of Home Assistant Core has the issue?

core-2022.8.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zha

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha

Diagnostics information

{
  "home_assistant": {
    "installation_type": "Home Assistant OS",
    "version": "2022.8.5",
    "dev": false,
    "hassio": true,
    "virtualenv": false,
    "python_version": "3.10.5",
    "docker": true,
    "arch": "x86_64",
    "timezone": "America/Los_Angeles",
    "os_name": "Linux",
    "os_version": "5.15.55",
    "supervisor": "2022.08.3",
    "host_os": "Home Assistant OS 8.4",
    "docker_version": "20.10.14",
    "chassis": "embedded",
    "run_as_root": true
  },
  "custom_components": {},
  "integration_manifest": {
    "domain": "zha",
    "name": "Zigbee Home Automation",
    "config_flow": true,
    "documentation": "https://www.home-assistant.io/integrations/zha",
    "requirements": [
      "bellows==0.32.0",
      "pyserial==3.5",
      "pyserial-asyncio==0.6",
      "zha-quirks==0.0.78",
      "zigpy-deconz==0.18.0",
      "zigpy==0.49.1",
      "zigpy-xbee==0.15.0",
      "zigpy-zigate==0.9.1",
      "zigpy-znp==0.8.1"
    ],
    "usb": [
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*2652*",
        "known_devices": [
          "slae.sh cc2652rb stick"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*sonoff*plus*",
        "known_devices": [
          "sonoff zigbee dongle plus"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*tubeszb*",
        "known_devices": [
          "TubesZB Coordinator"
        ]
      },
      {
        "vid": "1A86",
        "pid": "7523",
        "description": "*tubeszb*",
        "known_devices": [
          "TubesZB Coordinator"
        ]
      },
      {
        "vid": "1A86",
        "pid": "7523",
        "description": "*zigstar*",
        "known_devices": [
          "ZigStar Coordinators"
        ]
      },
      {
        "vid": "1CF1",
        "pid": "0030",
        "description": "*conbee*",
        "known_devices": [
          "Conbee II"
        ]
      },
      {
        "vid": "10C4",
        "pid": "8A2A",
        "description": "*zigbee*",
        "known_devices": [
          "Nortek HUSBZB-1"
        ]
      },
      {
        "vid": "0403",
        "pid": "6015",
        "description": "*zigate*",
        "known_devices": [
          "ZiGate+"
        ]
      },
      {
        "vid": "10C4",
        "pid": "EA60",
        "description": "*zigate*",
        "known_devices": [
          "ZiGate"
        ]
      },
      {
        "vid": "10C4",
        "pid": "8B34",
        "description": "*bv 2010/10*",
        "known_devices": [
          "Bitron Video AV2010/10"
        ]
      }
    ],
    "codeowners": [
      "@dmulcahey",
      "@adminiuga",
      "@puddly"
    ],
    "zeroconf": [
      {
        "type": "_esphomelib._tcp.local.",
        "name": "tube*"
      },
      {
        "type": "_zigate-zigbee-gateway._tcp.local.",
        "name": "*zigate*"
      }
    ],
    "after_dependencies": [
      "onboarding",
      "usb",
      "zeroconf"
    ],
    "iot_class": "local_polling",
    "loggers": [
      "aiosqlite",
      "bellows",
      "crccheck",
      "pure_pcapy3",
      "zhaquirks",
      "zigpy",
      "zigpy_deconz",
      "zigpy_xbee",
      "zigpy_zigate",
      "zigpy_znp"
    ],
    "is_built_in": true
  },
  "data": {
    "config": {},
    "config_entry": {
      "entry_id": "3f9c70374ed85700f78f4e321500b09f",
      "version": 3,
      "domain": "zha",
      "title": "ConBee II - /dev/serial/by-id/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_**REDACTED**-if00, s/n: **REDACTED**- dresden elektronik ingenieurtechnik GmbH - 1CF1:0030",
      "data": {
        "radio_type": "deconz",
        "device": {
          "path": "/dev/serial/by-id/usb-dresden_elektronik_ingenieurtechnik_GmbH_ConBee_II_**REDACTED**-if00"
        }
      },
      "options": {},
      "pref_disable_new_entities": false,
      "pref_disable_polling": false,
      "source": "usb",
      "unique_id": "**REDACTED**",
      "disabled_by": null
    },
    "application_state": {
      "node_info": {
        "nwk": 0,
        "ieee": "**REDACTED**",
        "logical_type": 0
      },
      "network_info": {
        "extended_pan_id": "**REDACTED**",
        "pan_id": 16393,
        "nwk_update_id": 1,
        "nwk_manager_id": 0,
        "channel": 25,
        "channel_mask": 33554432,
        "security_level": 5,
        "network_key": "**REDACTED**",
        "tc_link_key": {
          "key": [
            90,
            105,
            103,
            66,
            101,
            101,
            65,
            108,
            108,
            105,
            97,
            110,
            99,
            101,
            48,
            57
          ],
          "tx_counter": 0,
          "rx_counter": 0,
          "seq": 0,
          "partner_ieee": "**REDACTED**"
        },
        "key_table": [],
        "children": [],
        "nwk_addresses": {},
        "stack_specific": {},
        "metadata": {
          "deconz": {
            "version": 643303168
          }
        },
        "source": "zigpy-deconz@0.18.0"
      },
      "counters": {},
      "broadcast_counters": {},
      "device_counters": {},
      "group_counters": {}
    },
    "versions": {
      "bellows": "0.32.0",
      "zigpy": "0.49.1",
      "zigpy_deconz": "0.18.0",
      "zigpy_xbee": "0.15.0",
      "zigpy_znp": "0.8.1",
      "zigpy_zigate": "0.9.1",
      "zhaquirks": "0.0.78"
    }
  }
}

Example YAML snippet

No configuration YAML updates from the default beyond enabling logging per the ZHA integration recommendations:

logger:
  default: info
  logs:
    homeassistant.core: debug
    homeassistant.components.zha: debug
    bellows.zigbee.application: debug
    bellows.ezsp: debug
    zigpy: debug
    zigpy_deconz.zigbee.application: debug
    zigpy_deconz.api: debug
    zigpy_xbee.zigbee.application: debug
    zigpy_xbee.api: debug
    zigpy_zigate: debug
    zigpy_znp: debug
    zhaquirks: debug

### Anything in the logs that might be useful for us?

```txt
When comparing logs from the passing and failing cases, it seems the ZCL frame of `IasZone:enroll` is missing in the failing case. The failing case also has an extra `PowerConfiguration` frame at the point in the logs where the passing case has the `IasZone:enroll` frame.  The logs for the failing case never show an entry for `IasZone:enroll`.

FAILing case:

[0xBEEF:1:0x0001] Received ZCL frame: b'\x08\x01\n!\x00 \xa6'
[0xBEEF:1:0x0001] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=1, command_id=10, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[0xBEEF:1:0x0001] Decoded ZCL frame: PowerConfiguration:Report_Attributes(attribute_reports=[Attribute(attrid=0x0021, value=TypeValue(type=uint8_t, value=166))])

PASSing case:

[0xBEEF:1:0x0500] Received ZCL frame: b'\t\x01\x01\x15\x00\x88\x11'
[0xBEEF:1:0x0500] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=True), tsn=1, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[0xBEEF:1:0x0500] Decoded ZCL frame: IasZone:enroll(zone_type=<ZoneType.Contact_Switch: 21>, manufacturer_code=4488)


### Additional information

_No response_
probot-home-assistant[bot] commented 2 years ago

Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)


zha documentation zha source (message by IssueLinks)

puddly commented 2 years ago

As of Home Assistant 2022.08.x, that ConBee II firmware generates a ZHA error of Writing network frame counter is not supported with this firmware, and none of my Zigbee devices respond to controls or send sensor updates. Thus a firmware update seems necessary.

This warning is only generated when a Conbee has no network settings and ZHA has to form a new network to start up, either by restoring from a backup (2022.8.2 or newer), or randomly generating settings (any earlier version). Updating the firmware on a Conbee does not touch its network settings.

Are you saying you have seen this warning multiple times? Can you enable https://www.home-assistant.io/integrations/zha/#debug-logging and post a log of it happening?

As for the missing IasZone:enroll command, can you upload the diagnostic info for your sensor as well?

ErikSGross commented 2 years ago

Thanks for the super-quick response!

As for the Writing network frame counter is not supported with this firmware warning, I think that's a red herring for the CS100 issue. I included that as an answer to the question, "why did you update the ConBee II firmware if everything was working?"

Essentially, I updated from 2022.07 to 2022.08.3 and all my Zigbee devices were unresponsive after multiple reboot/restarts. I saw the network frame counter warning in the logs and figured my really-old ConBee II firmware might be the cause. Unfortuately, I didn't save the logs, and I haven't seen the error since. If I see it again, I'll enable the debug output and post logs.

Regarding the CS100 sensor - I didn't catch that it wasn't included in the ZHA integration diagnostics. I'll attach them here.

zha-3f9c70374ed85700f78f4e321500b09f-TP-Link CS100-c74fbf1d349c4222b3179a0fbf83ca9c.json.txt

puddly commented 2 years ago

I think that's a red herring for the CS100 issue. I included that as an answer to the question

It probably is. However, you shouldn't be seeing it in the first place when using ZHA: your network should never be reset unless you wipe the stick externally, which is only possible with the Conbee if you use deCONZ and "leave" the current network, or factory reset from Phoscon. If you do manage to get this to happen again, I'd really appreciate a debug log, as it would indicate that somehow ZHA is either inappropriately reading the network settings from your Conbee and detecting a "no network" state where there is one, or is somehow erasing the network settings (very unlikely).

Regarding your door/window sensor: I don't have the exact model but I have a similar standards-compliant motion sensor. I've disabled ZHA's proactive IasZone:enroll_response so that ZHA waits for a device to contact it.

I'm able to join it to a Conbee running firmware 0x26780700 and ZHA correctly handles the device sending it a IasZone:enroll request:

2022-08-17 23:21:24.560 DEBUG (MainThread) [zigpy.zcl] [0x7EBB:5:0x0500] Received command 0x01 (TSN 0): enroll(zone_type=<ZoneType.Motion_Sensor: 13>, manufacturer_code=4403)
2022-08-17 23:21:24.561 DEBUG (MainThread) [homeassistant.components.zha.core.channels.base] [0x7EBB:5:0x0500]: Enroll requested
2022-08-17 23:21:24.561 DEBUG (MainThread) [zigpy.zcl] [0x7EBB:5:0x0500] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=False), tsn=78, command_id=0, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
2022-08-17 23:21:24.561 DEBUG (MainThread) [zigpy.zcl] [0x7EBB:5:0x0500] Sending request: enroll_response(enroll_response_code=<EnrollResponse.Success: 0>, zone_id=0)

Similarly with firmware 0x26580700:

2022-08-17 23:36:42.939 DEBUG (MainThread) [zigpy.zcl] [0x81D3:5:0x0500] Received command 0x01 (TSN 0): enroll(zone_type=<ZoneType.Motion_Sensor: 13>, manufacturer_code=4403)
2022-08-17 23:36:42.941 DEBUG (MainThread) [homeassistant.components.zha.core.channels.base] [0x81D3:5:0x0500]: Enroll requested
2022-08-17 23:36:42.941 DEBUG (MainThread) [zigpy.zcl] [0x81D3:5:0x0500] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=False), tsn=76, command_id=0, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
2022-08-17 23:36:42.942 DEBUG (MainThread) [zigpy.zcl] [0x81D3:5:0x0500] Sending request: enroll_response(enroll_response_code=<EnrollResponse.Success: 0>, zone_id=0)

Everything seems to be working as intended with both Conbee firmware versions. Both times I removed the ZHA integration, formed a brand new network on the Conbee, and then joined the device.

Could you upload both the working and not working ZHA/zigpy debug logs you captured earlier?

ErikSGross commented 2 years ago

Regarding the network reset: I wasn't doing anything special when that happened, just updating HA from 2022.07.7 to 2022.08.3. I have about 80 devices in the Zigbee network, with about 25 of them being mains-powered lights (repeaters). I didn't touch the ConBee II or mess with ZHA settings at all when upgrading HA. If I can reproduce the failure, I'll definitely grab some logs.

Regarding the CS100 sensor:

As another datapoint, I do have other door/window sensors in my prod system, and they all work fine with the latest (0x26780700) ConBee II firmware. Those sensors are from Samsung, CentraLite, and Linkind; all of which have functional IasZone entities.

The TP-Link CS100 sensors seem to be special in this regard. Maybe it's not fully compliant with the Zigbee spec? I find it really odd, though, that using an HA instance with ConBee II firmware > 0X26580700 seems to make HA not recognize the IasZone properly, even when reverting the Conbee II firmware back to a version that worked previously.

I'm attaching a passing (0x26580700) log and a failing (0x26650700) log from my dev HA instance that just has the ConBee II and a single CS100 sensor. The logs cover the scenario of starting HA with just the ConBee II, adding the CS100 sensor, opening/closing the sensor a few times, and then removing the sensor.

1 home-assistant - 2658 - pass.log 2 home-assistant - 2665 - fail.log

ErikSGross commented 2 years ago

See anything interesting in the logs that would indicate why the TP-Link sensors stopped working?

puddly commented 2 years ago

If you filter by zigpy.zcl, 0x4037|0x91F7, and Sending request|Sending reply|Decoded ZCL frame, you can see a high level view of what's being sent/received.

In both cases:

  1. The device's basic information is properly read.
  2. The coordinator's IEEE address is written to the cie_addr attribute of the IasZone cluster:

    2022-08-17 16:16:17.032 DEBUG (MainThread) [zigpy.zcl] [0x91F7:1:0x0500] Sending request: Write_Attributes(attributes=[Attribute(attrid=0x0010, value=TypeValue(type=EUI64, value=00:21:2e:ff:ff:05:ce:14))])
    2022-08-17 16:16:17.264 DEBUG (MainThread) [zigpy.zcl] [0x91F7:1:0x0500] Decoded ZCL frame: IasZone:Write_Attributes_rsp(status_records=[WriteAttributesStatusRecord(status=<Status.SUCCESS: 0>)])
  3. However, as you mentioned earlier, the device never sends an IasZone:enroll request in the failing case, ZHA just ends up sending its proactive IasZone:enroll response.

Perhaps the device is confused by the proactive enroll response. Can you try disabling it and re-join the device? https://github.com/home-assistant/core/blob/0caf998547e84e229792a6fe29840dce31b30c06/homeassistant/components/zha/core/channels/security.py#L383-L384

ZHA should instead reply when the device sends an enroll request.

ErikSGross commented 2 years ago

I attempted to modify the HA code to skip the proacticve enroll response, and I never saw that part of the code get executed... so I don't think I actually tested the scenario you suggested.

Since my dev box was on 2022.08.3, I grabbed the 2022.08.3 version of the zha component, added it to the custom_components directory, commented out line 384 as suggested above, and then changed the debug message on 383 so that I'd know it acutally got executed. I never saw the debug message.

However, I did find something else interesting. After trying to pair the CS100 device a bunch of times with firmware 0x26650700, eventually it worked correctly. Or at least I get a reasonable report of battery level and the IASZone seems to work properly. I tried removing and re-pairing the same device a few more times, and sometimes it worked and sometimes it didn't. The single AA battery was at 83% (1.39V vs nominal 1.50V), and I also tried with a brand new (1.61V) battery with the same results. I also tried a few other CS100 sensors and saw the same behavior, both when adopting via the Zigbee coordinator and when adopting via a router out on the mesh. So apparently, it's not that the device won't pair at all with the new HA and new FW, it's that something is making it pair very unreliably.

In the logs where the device did not pair successfully, I saw something like:

[zigpy.application] Received frame on uninitialized device <Device model=None manuf=None nwk=0x5F6B ieee=00:12:4b:00:07:fc:cf:a5 is_initialized=False> from ep 1 to ep 1, cluster 1280: b'\t\x01\x00!\x00\x00\x00\x00\x00'
[zigpy.device] [0x5f6b] Received ZCL while uninitialized on endpoint id 1, cluster 0x0500 id, hdr: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=True), tsn=1, command_id=0, *direction=<Direction.Client_to_Server: 1>, *is_reply=True), payload: b'!\x00\x00\x00\x00\x00'

and this was not present in the case where the device paired successfully.

Then, a little further down in the log, I see:

[zigpy.zcl] [0x5F6B:1:0x0001] Received ZCL frame: b'\x08\x02\n!\x00 \xa6'
[zigpy.zcl] [0x5F6B:1:0x0001] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=2, command_id=10, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0x5F6B:1:0x0001] Decoded ZCL frame: PowerConfiguration:Report_Attributes(attribute_reports=[Attribute(attrid=0x0021, value=TypeValue(type=uint8_t, value=166))])
[zigpy.zcl] [0x5F6B:1:0x0001] Received command 0x0A (TSN 2): Report_Attributes(attribute_reports=[Attribute(attrid=0x0021, value=TypeValue(type=uint8_t, value=166))])
[zigpy.zcl] [0x5F6B:1:0x0001] Attribute report received: battery_percentage_remaining=166
[zigpy.zcl] [0x5F6B:1:0x0001] Sending reply header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=2, command_id=<GeneralCommand.Default_Response: 11>, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0x5F6B:1:0x0001] Sending reply: Default_Response(command_id=10, status=<Status.SUCCESS: 0>)

and again, this is not present in the passing case. The lines indicating the PowerConfiguration:Report_Attributes appear in the same general location in the log as the enroll entries in the passing case, which look like this:

[zigpy.zcl] [0x03C7:1:0x0500] Received ZCL frame: b'\t\x01\x01\x15\x00\x88\x11'
[zigpy.zcl] [0x03C7:1:0x0500] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=True), tsn=1, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0x03C7:1:0x0500] Decoded ZCL frame: IasZone:enroll(zone_type=<ZoneType.Contact_Switch: 21>, manufacturer_code=4488)
[zigpy.zcl] [0x03C7:1:0x0500] Received command 0x01 (TSN 1): enroll(zone_type=<ZoneType.Contact_Switch: 21>, manufacturer_code=4488)

At this point, after [re]trying to pair the devices many times, I eventually got all 5 of them to pair, and they've been operational for about 24h.

I'm not sure how many folks are using these sensors with HA, as I believe they've been discontinued by TP-Link for a while now. From an intellectual perspective, I'd love to know what's going on in the failing case and get them to pair reliably. However, I also know there are lots of bugs/features that need attention as well. If few others chime in that they are having issues and mine are still working properly after a week or so, there might be low ROI on pursuing this one... unless you think the root cause could affect more than just these sensors.

ErikSGross commented 2 years ago

Also, what is the second number (after the NetworkID) in the log messages?

e.g. 0x0500 in [0x03C7:1:0x0500]

ErikSGross commented 2 years ago

After staring at the logs a little more (still a Zigbee n00b), the passing case shows the first ZCL lines as:

[zigpy.zcl] [0xNWKID:1:0x0000] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=False), tsn=XX, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
[zigpy.zcl] [0xNWKID:1:0x0000] Sending request: Read_Attributes(attribute_ids=[4, 5])

[zigpy.zcl] [0xNWKID:1:0x0500] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False, *is_reply=True), tsn=XX, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0xNWKID:1:0x0500] Decoded ZCL frame: IasZone:enroll(zone_type=<ZoneType.Contact_Switch: 21>, manufacturer_code=4488)

[zigpy.zcl] [0xNWKID:1:0x0000] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=XX, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0xNWKID:1:0x0000] Decoded ZCL frame: Basic:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0004, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='TP-Link')), ReadAttributeRecord(attrid=0x0005, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='CS100'))])

Am correct that the coordinator is sending a request to read attributes 4 and 5 from the device and then the device is sending a IasZone:enroll response back along with a Basic:Read_Attributes_rsp that has the data for both attribute IDs? It looks like Atrribute 4 is the manufacturer and Attribute 5 is the model, so that seems to make sense. I'm not sure if the enroll response timing is related to the request to read attributes, but they seem to appear together in the logs.

Then in the failing case, we see:

[zigpy.zcl] [0xNWKID:1:0x0000] Sending request header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Server_to_Client: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=False), tsn=XX, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Server_to_Client: 0>, *is_reply=False)
[zigpy.zcl] [0xNWKID:1:0x0000] Sending request: Read_Attributes(attribute_ids=[4, 5])

[zigpy.zcl] [0xNWKID:1:0x0001] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=XX, command_id=10, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0xNWKID:1:0x0001] Decoded ZCL frame: PowerConfiguration:Report_Attributes(attribute_reports=[Attribute(attrid=0x0021, value=TypeValue(type=uint8_t, value=166))])

[zigpy.zcl] [0xNWKID:1:0x0001] Sending reply header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=XX, command_id=<GeneralCommand.Default_Response: 11>, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0xNWKID:1:0x0001] Sending reply: Default_Response(command_id=10, status=<Status.SUCCESS: 0>)

[zigpy.zcl] [0xNWKID:1:0x0000] Decoded ZCL frame header: ZCLHeader(frame_control=FrameControl(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=0, direction=<Direction.Client_to_Server: 1>, disable_default_response=1, reserved=0, *is_cluster=False, *is_general=True, *is_reply=True), tsn=XX, command_id=1, *direction=<Direction.Client_to_Server: 1>, *is_reply=True)
[zigpy.zcl] [0xNWKID:1:0x0000] Decoded ZCL frame: Basic:Read_Attributes_rsp(status_records=[ReadAttributeRecord(attrid=0x0004, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='TP-Link')), ReadAttributeRecord(attrid=0x0005, status=<Status.SUCCESS: 0>, value=TypeValue(type=CharacterString, value='CS100'))])

Note that the first 2 and last 2 lines of the passing/failing cases are identical.

The lines between the GeneralCommand.Read_Attributes request at the top and the Basic:Read_Attributes_rsp lines at the bottom are interpreted as PowerConfiguration:Report_Attributes instead of IasZone:enroll. I'm not sure if it's a coincidence, but the 0x21 value that seems to indicate a Contact Switch zone type in the passing case is also the attrib_id value associated with the PowerConfiguration:Report_Attributes response in the failing case. Further, the value associated with the battery percentage returned by the PowerConfiguration:Report_Attributes response in the failing case is reported as 106% by Home Assistant, so that seems suspicious as well.

I haven't dug into the specs enough to manually decode Zigbee frames, but this seems like either ZHA is incorrectly decoding the enroll response frame or the device is sending a corrupted frame to ZHA, which can't interpret it. Is there a good manual frame decoder tool that could shed some light on this?

The raw frame lines above the responses are:

[zigpy_deconz.uart] Frame received: 0x17970025001e002202fdff0102c703010401000507000901011500881100afff00000000d6
[zigpy_deconz.api] APS data indication response: [30, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 34>, <DeconzAddress address_mode=ADDRESS_MODE.NWK address=0xfffd>, 1, <DeconzAddress address_mode=ADDRESS_MODE.NWK address=0x03c7>, 1, 260, 1280, b'\t\x01\x01\x15\x00\x88\x11', 0, 175, 255, 0, 0, 0, 0, -42]

in the passing case and

[zigpy_deconz.uart] Frame received: 0x17490025001e002202000001026b5f0104010100070008020a210020a600afff00000000c3
[zigpy_deconz.api] APS data indication response: [30, <DeviceState.APSDE_DATA_REQUEST_SLOTS_AVAILABLE|2: 34>, <DeconzAddress address_mode=ADDRESS_MODE.NWK address=0x0000>, 1, <DeconzAddress address_mode=ADDRESS_MODE.NWK address=0x5f6b>, 1, 260, 1, b'\x08\x02\n!\x00 \xa6', 0, 175, 255, 0, 0, 0, 0, -61]

in the failing case if that's helpful.

puddly commented 2 years ago

Also, what is the second number (after the NetworkID) in the log messages?

It's formatted as [device_nwk:endpoint_id:cluster_id].

Am correct that the coordinator is sending a request to read attributes 4 and 5 from the device and then the device is sending a IasZone:enroll response back along with a Basic:Read_Attributes_rsp that has the data for both attribute IDs?

Correct. The two being interleaved is probably coincidence though.

Further, the value associated with the battery percentage returned by the PowerConfiguration:Report_Attributes response in the failing case is reported as 106% by Home Assistant, so that seems suspicious as well.

The raw battery readings should be between 0-200 (they're reported as half-integer percentages). The raw value of 166 should be displayed as 83%. For HA to show 106%, the device would need to have reported 212, which isn't right.

this seems like either ZHA is incorrectly decoding the enroll response frame or the device is sending a corrupted frame to ZHA, which can't interpret it.

The ZCL decoding code within zigpy is solid and well-tested, I wouldn't expect it to have problems like this.

If you're really curious to see what's going on, I suggest you buy a $10 nRF52840 stick or a Silicon Labs EFR32 Zigbee coordinator, both of which can sniff Zigbee traffic via Wireshark. Wireshark has its own independent ZCL dissector.

ErikSGross commented 2 years ago

It's been almost 3 weeks since I was able to get my sensors back into my ZHA mesh by repeatedly trying to add them until they succeed. All of those sensors are still working - i.e. IASZone sensor working, battery level reported

I've updated to 2022.09.4, and I'm still seeing the (already added) sensors operating properly.

Given that nobody else has chimed in that their CS100 sensors are having this issue and I've been able to get mine functional again with a workaround (multiple pairing attempts until success), I'm good with closing this issue as won't be fixed.

If the problem increases in severity or many others start reporting the problem, we can revisit it.

github-actions[bot] commented 1 year ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.