Closed stevenwfoley closed 1 year ago
Hey there @dmulcahey, @adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
zha documentation zha source (message by IssueLinks)
Thank you for the detailed debugging! Would it be possible for you to ZIP up and upload the packet captures here (with the network key) so that I can take a look at the raw traffic as well?
The end device polling its parent too frequently is likely the cause.
Attached. My zigbee network has over 100 devices, so there's a lot of data. I hope that my applied filter is not too limiting. It's showing all packets with a source and destination of my wireless dimmer device. Everything else just looked like irreverent chatter. legrand coordinator.zip zha coordinator.zip
Then paring one real Zigbee 3 end device its requesting one end device timeout
from its parent after it have getting the network key. If the parent is not supporting it (like TI coordinators) they is normally not replaying to it and the device is using Zigbee HA 1.2 standard for pilling its parent > bad battery performance or leaving the network (TI in Z2M). The parent 0x09554 in ZHA is wot device ? and 0x04dcf in the native hub ? Both is routing devices but not the coordinator that shall having address 0x0000.
Disabling the pullcontrol cluster is not helping only make the battery draining worse then the device must pulling its parent more oft.
I looking little in the sniffs and see if i can see somthing strange.
PS: Z2M is having large problems with this switches leaving with TI coordinators as its not responding the the request but is working OK with other routers as parent.
Edit: Interesting is one complete paring with both GW so can see all things being set up in the sniffs and not some commands.
My zha coordinator is a sonoff dongle-E, so SI not TI. The parent 0x09554 in zha is a zigbee 3.0 light (zemismart) that is controlled by the legrand dimmer, so it never looses mains power. Sorry, I do not know if 0x04dcf is the native legrand hub.
Attached is a full capture of the join on both coordinators. Please let me know if theres anything else I can test or try.
ZHA looks OK but the device is acting little stage. Interesting frames: End device timeout request 8 min
ZigBee Network Layer Command, Dst: 0xdda4, Src: 0x3f83
Frame Control Field: 0x1a09, Frame Type: Command, Discover Route: Suppress, Security, Destination, Extended Source Command
Destination: 0xdda4
Source: 0x3f83
Radius: 1
Sequence Number: 133
Destination: Jennic_00:02:5c:9d:59 (00:15:8d:00:02:5c:9d:59)
Extended Source: IEEERegi_ff:fe:61:7a:ba (64:62:66:ff:fe:61:7a:ba)
ZigBee Security Header
Command Frame: End Device Timeout Request
Command Identifier: End Device Timeout Request (0x0b)
Requested Timeout Enumeration: 8 min (3)
End Device Configuration: 0x00
Response:
ZigBee Network Layer Command, Dst: 0x3f83, Src: 0xdda4
Frame Control Field: 0x1a09, Frame Type: Command, Discover Route: Suppress, Security, Destination, Extended Source Command
Destination: 0x3f83
Source: 0xdda4
Radius: 1
Sequence Number: 198
Destination: IEEERegi_ff:fe:61:7a:ba (64:62:66:ff:fe:61:7a:ba)
Extended Source: Jennic_00:02:5c:9d:59 (00:15:8d:00:02:5c:9d:59)
ZigBee Security Header
Command Frame: End Device Timeout Response, Success
Command Identifier: End Device Timeout Response (0x0c)
Status: Success (0)
Parent Information: 0x07, MAC Data Poll Keepalive, End Device Timeout Request Keepalive, Power Negotiation Supported
.... ...1 = MAC Data Poll Keepalive: True
.... ..1. = End Device Timeout Request Keepalive: True
.... .1.. = Power Negotiation Supported: True
The parent is supporting all possible futures. The swiwtch node description:
ZigBee Application Support Layer Data, Dst Endpt: 0, Src Endpt: 0
Frame Control Field: Data (0x40)
Destination Endpoint: 0
Node Descriptor Response (Cluster ID: 0x8002)
Profile: ZigBee Device Profile (0x0000)
Source Endpoint: 0
Counter: 4
ZigBee Device Profile, Node Descriptor Response, Rev: 22, Nwk Addr: 0x3f83, Status: Success
Sequence Number: 31
Status: Success (0)
Nwk Addr of Interest: 0x3f83
Node Descriptor
.... .... .... .010 = Type: 2 (End Device)
.... .... .... 0... = Complex Descriptor: False
.... .... ...1 .... = User Descriptor: True
.... 0... .... .... = 868MHz BPSK Band: False
..0. .... .... .... = 902MHz BPSK Band: False
.1.. .... .... .... = 2.4GHz OQPSK Band: True
0... .... .... .... = EU Sub-GHz FSK Band: False
Capability Information: 0x80
.... ...0 = Alternate Coordinator: False
.... ..0. = Full-Function Device: False
.... .0.. = AC Power: False
.... 0... = Rx On When Idle: False
.0.. .... = Security Capability: False
1... .... = Allocate Short Address: True
Manufacturer Code: 0x1021
Max Buffer Size: 89
Max Incoming Transfer Size: 63
Server Flags: 0x2c00
.... .... .... ...0 = Primary Trust Center: False
.... .... .... ..0. = Backup Trust Center: False
.... .... .... .0.. = Primary Binding Table Cache: False
.... .... .... 0... = Backup Binding Table Cache: False
.... .... ...0 .... = Primary Discovery Cache: False
.... .... ..0. .... = Backup Discovery Cache: False
.... .... .0.. .... = Network Manager: False
0010 110. .... .... = Stack Compliance Revision: 22
Max Outgoing Transfer Size: 63
Descriptor Capability Field: 0x00
.... ...0 = Extended Active Endpoint List Available: False
.... ..0. = Extended Simple Descriptor List Available: False
Its one end device and radio off then idle = one real sleeper. ZHA configure long pull interval 6 seconds:
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
Frame Control Field: Data (0x00)
Destination Endpoint: 1
Cluster: Poll Control (0x0020)
Profile: Home Automation (0x0104)
Source Endpoint: 1
Counter: 210
ZigBee Cluster Library Frame
Frame Control Field: Cluster-specific (0x01)
Sequence Number: 85
Command: Set Long Poll Interval (0x02)
New Long Poll Interval: 24
24 * 1/4 sec = 6 sec
ZHA ending fast pull after checking with command Fast Poll Stop
=> long pull interval.
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
Frame Control Field: Data (0x00)
Destination Endpoint: 1
Cluster: Poll Control (0x0020)
Profile: Home Automation (0x0104)
Source Endpoint: 1
Counter: 212
ZigBee Cluster Library Frame
Frame Control Field: Cluster-specific (0x01)
Sequence Number: 89
Command: Fast Poll Stop (0x01)
And the device is pulling its parent ??
No. Time PAN Protocol IEEE Src IEEE Dst Zigbee Src Zigbee Dst ZBN Dst Group Nr ZBN Seq ZBA Seq ZDP Seq Nwk Seq Src EP Dst EP Info
2894 15:12:58,999096 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 94 Data Request
2895 15:12:59,021161 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 94 Ack
2943 15:13:00,082205 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 95 Data Request
2944 15:13:00,082532 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 95 Ack
2992 15:13:00,559708 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 96 Data Request
2993 15:13:00,560481 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 96 Ack
3007 15:13:01,096412 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 97 Data Request
3008 15:13:01,097597 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 97 Ack
3018 15:13:01,638609 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 98 Data Request
3019 15:13:01,639430 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 98 Ack
3026 15:13:02,184887 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 99 Data Request
3027 15:13:02,185962 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 99 Ack
3037 15:13:02,741087 0x74f1 IEEE 802.15.4 0x3f83 0xdda4 0x3f83 0xdda4 100 Data Request
3038 15:13:02,742125 0x74f1 IEEE 802.15.4 0xdda4 0x3f83 N/A N/A 100 Ack
2 times / seconds !!! I think you is needing some Ah battery for getting the device not dying in some days ;-))
Can you doing one new sniff then pairing with legrand GW but little longer so i can see if its doing the pulling with our without configuring of the chicken and pull control functions ?
PS: you can using filter in wireshark for only see frames to and from one device like this:
(wpan.src16 == 0x3f83) || (wpan.dst16 == 0x3f83)
in the 802.15.4 layer so you also see the requests and acks in 15.4
Edit: Its 2 request / second then its also one replay / ack from the parent.
Thanks for looking at the captures! And yeah, some massive batteries would solve the problem for sure :) Attached is another capture of a join on the legrand coordinator, but for a much longer span of time.
The switch is reading some very custom attribute on the coordinator and the coordinator is doing the same but is not setting up and strange things only binding reliant clusters.
The coordinator is having one little unushal working mode then is short address is 0x4dcf
and then the device is asking matching node discretion for OTA (asking if some have OTA server function) its saying 0x000
and the switch is asking for long address of 0x0000
and the coordinator is is sending one IEEE that is not have but is listening to it then getting commands sent to 0x0000
but i think its one combined ZLL working mode with distributed security and coordinator.
Then all is done its pulling its parent 2 time in the second for long time also after is have getting one fast pull stop command. Then its being silent 35 minutes = its not OK then its shall pulling its parent at least in 8 minutes and it have saying in the end device timeout request then joining (the best practice is saying shorter then the half time if one package is missing). Then its rejoining the system and is being updated and looks working "normal".
No. Time PAN Protocol IEEE Src IEEE Dst Zigbee Src Zigbee Dst ZBN Dst Group Nr ZBN Seq ZBA Seq ZDP Seq Nwk Seq Src EP Dst EP Info
864 23:04:06,575987 0xf6d8 IEEE 802.15.4 0x3e51 0x4dcf 0x3e51 0x4dcf 147 Data Request
865 23:04:06,576876 0xf6d8 IEEE 802.15.4 0x4dcf 0x3e51 N/A N/A 147 Ack
866 23:04:07,118681 0xf6d8 IEEE 802.15.4 0x3e51 0x4dcf 0x3e51 0x4dcf 148 Data Request
867 23:04:07,121547 0xf6d8 IEEE 802.15.4 0x4dcf 0x3e51 N/A N/A 148 Ack
868 23:04:08,202838 0xf6d8 IEEE 802.15.4 0x3e51 0x4dcf 0x3e51 0x4dcf 149 Data Request
869 23:04:08,204130 0xf6d8 IEEE 802.15.4 0x4dcf 0x3e51 N/A N/A 149 Ack
870 23:04:08,761875 0xf6d8 IEEE 802.15.4 0x3e51 0x4dcf 0x3e51 0x4dcf 150 Data Request
871 23:04:08,763097 0xf6d8 IEEE 802.15.4 0x4dcf 0x3e51 N/A N/A 150 Ack
873 23:04:09,310448 0xf6d8 IEEE 802.15.4 0x3e51 0x4dcf 0x3e51 0x4dcf 151 Data Request
874 23:04:09,312088 0xf6d8 IEEE 802.15.4 0x4dcf 0x3e51 N/A N/A 151 Ack
2354 23:40:02,144922 0xf6d8 ZigBee 0x3e51 0x4dcf 0x3e51 0x4dcf 0x4dcf 227 66 Rejoin Request, Device: 0x3e51
You dont need posting the sniff but im interesting to knowing how the switch is working in loner time with the legrand GW. Is it pulling its parent 2 time in the seconds all the time or is it taking pauses ? And if how long is the pauses ? and is it rejoining the network more times in long time (normally never of not battery is out and getting one new but i think 24 hours it shall not do it).
If is not working stable with legrand GW we properly cant getting it working OK in ZHA. If it being stable we must looking more on the attribute its reading from the GW and the GW is reading from the switch but is useless if its not working stable with legrand GW.
I think the firmware is having serous bugs in the pull control part then its dont respecting the set long pull interval after fast pull stop or checking time out but need to see how is doing in the long run.
I have a dimmer that's been on the legrand GW for over a year, it works perfectly, and its still at 100% battery. It appears that the dimmer goes into deep sleep and never communicates with the GW unless you push a button on the dimmer. I ran a capture for over 24 hours and I did not see a single packet until I pushed a button on the dimmer. I would expect this from an SED though, especially one that never has the need to receive data from the GW. The relationship between the GW and a SED is very one directional.
I agree with you with "real" SED but its depends how its configured. If the "24 hour switch" is having the same config as the sniffed one its it not doing it OK. The switch was requesting end device timeout with 8 min so it shall doing 15.4 data request from its parent often as 8 min and the best under 4 if one frame is being misses pr is its parent flagging it as off line and deleting it form its child table and the network cant finding it = offline. (for sniffing the 15.4 pulls and acks you must having the sniffer in near of the device then its not relayed to other routers like commands to and from the coordinator and the device)
I is testing one old firmware for IKEA OnOff switch and its ZLL so its not doing checkins then it was not implanted in the first ZLL standard but is doing pulling if its parent for not being flagged offline. With latest ZB3 its doing checking to the coordinator around 55 min and no other things then its not needing it and if the system need sending somthing to it its putting in the queer and sending it to the device then its doing next checkin to the coordinator. ZHA is having one record in the log every 50 min with the ckeckin from the device that is very nice to see its working OK.
Back to the Legrand GW long sniff: Then the "42 hour switch" we is not knowing how it was configured so its not easy getting ZHA doing the same config of it for getting it sleeping well (it can have getting one different config then its different firmware in the device and GW then it was paired). That way it was interesting see how the "fast pulling switch" was going in deep sleep after the configure we have seen in the sniff (it was not doing it).
So we still need sniffing one switch joining Legrand GW and setting it up and see that the switch is going in deep sleep and then tying getting ZHA configuring the same way and getting it going in deep sleep and not have it pulling its parent 2/second and rejoining the network all the time.
Also i knowing how ZB3 shall setting up device in the right way but with older Zigbee HA 1.X and ZLL 1.X is little more tricky then it was not so strict as ZB3 is and its more well documented and can looking how other system is doing it by sniffing then.
How shall we going forward ? Im curious how Legrand QW is setting up the switch for getting it going in deep sleep and working OK for getting it working OK with ZHA.
Thank you so much for your continued help! Sorry for the delayed response, we are moving to a new house. According to legrand, all of their modern devices are running ZB3 firmware. I'm not sure where to go from here either, but I'm willing to try anything I can. Do you have any suggestions for what I can try to narrow down the specifics of the issue?
One universal way on EZSP to fixing battery daring controllers is to blocking the coordinator having end device as children. The problem is normally if the end device is trying communicating and the coordinator is offline or the host system is having problems they is jumping to one new router if they can (Xiaomi cant they leaving the network). Way is helping = IKEA was having and its looks many other like HUE bugs that was triggered of end device jumping and killing the battery.
I was thinking you can trying the same then you have time and getting the things in place in you new home then it can its helping in your system to with end device problems.
Also making more sniffing for try catching what is going wrong but its no hurry from my side and you need your time for more importig things for the moment.
I know I've read about and seen the config entries necessary for limiting the children on the coordinator, but I'm not able to find it again. Can you please remind me of the config entries?
Part if my config:
zha:
custom_quirks_path: /config/custom_zha_quirks/
# handle_unknown_devices: yes
zigpy_config:
network:
# key: [11,11,14,14,13,13,12,11,10,10,11,12,13,14,11,11] ## 16 bytes of network key
channel: 15
channels: [11, 15, 20, 25]
pan_id: 0xA11A
# extended_pan_id: "DD:DD:DD:DD:DD:DD:DD:DD"
source_routing: true
ezsp_config:
CONFIG_MAX_END_DEVICE_CHILDREN: 0
Little scrambled for not making my neighbors tooo happy. If you is setting up one new network you must having it not zero until have adding the first router or ZHA is accepting it and then you can having zero or what you like and its only take affect then restarting (Z)HA.
Thanks. I added this entry and restarted HA. Do I need to re-pair the dimmers to ensure they are not connecting to the coordinator, or will the coordinator reject them and force them to find a router?
If it was having the coordinator as parent it shall being kicked and joining on other router it can find and you can see in the network map / visualization how its connected (clock on update and waiting little and going back to the map so update network data is current).
If you like you can repairing it or only using it little and see if its working then it shall having connection but not direct with the coordinator.
I only knowing that Aqara / Xiaomi sensors is not jumping they is leaving silent if the parent is offline but all good devices shall jumping then its needed.
Well the "CONFIG_MAX_END_DEVICE_CHILDREN: 0" setting did not improve battery life.
FWIW, these devices seem to work as expected with zigbee2mqtt. I switched to z2m last week and the two dimmers I'm testing have had zero battery drain so far. I would ultimately like to stay on ZHA. Can z2m be used to try and figure out how ZHA can be altered to support these legrand wireless devices?
Sounds great ! Z2M is setting up the coordinator part little different but the working mode is the same from the network point of view. The interesting is witch parent the dimmer is using. Can you looking on the network map how its connected in the network ?
The parent for both dimmers is the coordinator. I actually paired them first, before I had any other routers paired, so the coordinator was the only option.
The Z2M have getting the EZSP setting / parameters for the working mode very good !! Normally is being problems with them like TI coordinator cant have direct Zigbee 3 children if nit using the last beta firmware in the coordinator.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
The problem
The batteries in my Legrand Wireless dimmers are lasting less than a week. Legrand says the batteries will last 7-10 years, but only when using their zigbee coordinator/gateway device.
What version of Home Assistant Core has the issue?
2023.3.1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
zha
Link to integration documentation on our website
https://www.home-assistant.io/integrations/zha/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
I've read about all the ikea (and other manufacturer) related battery drain issues, but none of those resolutions have helped with these devices. I tried adding the Legrand (4129) manufacturer to _IGNORED_MANUFACTURER_ID. I also tried removing the Polling cluster entity from the zha quirk. I've taken some wireshark captures of the switch joined to a Legrand/Netatmo coordinator, and joined on my HA/ZHA coordinator. I have noticed some discrepancies, but I have no idea how to effect any change to debug. The capture of the Legrand coordinator issues one set of "End Device Timeout Request" and "End Device Timeout Request Response" packets but with ZHA they come every few minutes. The differences I notice is in the "End Device Timeout Response, Success" packets.
The Legrand coordinator packet has this data:
The ZHA coordinator packet has this data:
Legrand coordinator capture:
ZHA coordinator captures:
Thanks for any help!