zigpy / bellows

A Python 3 project to implement EZSP for EmberZNet devices
GNU General Public License v3.0
179 stars 87 forks source link

Ignore delivery failures for broadcasts #499

Closed puddly closed 1 year ago

puddly commented 1 year ago

Broadcasts seem to periodically fail with delivery failure statuses.

https://github.com/home-assistant/core/issues/79832

codecov-commenter commented 1 year ago

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage :thumbsup:

Coverage data is based on head (058919c) compared to base (e941fbd). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #499 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 47 47 Lines 3289 3289 ========================================= Hits 3289 3289 ``` | [Impacted Files](https://codecov.io/gh/zigpy/bellows/pull/499?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy) | Coverage Δ | | |---|---|---| | [bellows/zigbee/application.py](https://codecov.io/gh/zigpy/bellows/pull/499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy96aWdiZWUvYXBwbGljYXRpb24ucHk=) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

MattWestb commented 1 year ago

This is interesting!! Unicast is / can / shall being ACKed in 154, Zigbee Net and Zigbee APP layers. Broadcast is using "passive ACK" that is little tricky and using the neighbor table in the router.

If one router is having 5 neighbors and one child and the child is sending one broadcast request to it parent is being done with unicast and being ACKed in 154 layer and the router is making one broadcast. After have sending the frame is the router listening if all the 5 neighbors is "relying" the broadcast and if only 4 of 5 is doing it the Zigbee stack is raising one delivery error.

With Sonoff ZBB is more then likely is happening then WiFi is also active then the broadcast is being done. Then the question is shall it being logged of the coordinator that is having problem with broadcast or only ignoring it ?

Also Sonoff have not implanting Zigbee / BT / WiFi coexistence that is recommended by Silabs and is making the different radios is not active at the same time (hard and software request of sending slots).

puddly commented 1 year ago

After have sending the frame is the router listening if all the 5 neighbors is "relying" the broadcast and if only 4 of 5 is doing it the Zigbee stack is raising one delivery error.

That is very interesting, I wasn't aware broadcasts had an ACK-like feature. Does this happen even after an EmberZNet reset? Or is the stack keeping track of routers internally?

MattWestb commented 1 year ago

Was only finding one good description from Atmel how BitCloud is doing it but it shall being the same for all other implanted Zigbee stacks. https://ww1.microchip.com/downloads/en/Appnotes/Atmel-42687-ZigBee-Broadcast-Message-Handling-and-Implementation-in-BitCloud-SDK_AT14615_ApplicationNote.pdf

I think (=not knowing) all neighbor and router table is being lost then the chip is rebooted and the stack i building them son as possible for getting the network working OK.

MattWestb commented 1 year ago

Then you have reading how the mechanism is working and consider different Zigbee stack is doing things little different we have one old problem that some is fixing in the firmware and other not. If one broadcast frame is not being passive acked OK its being resent by the sender is repeating it with the same TSN = OK.

Then one coordinator is one router with TC its forwarding the broadcast to the host. EZSP is not doing any filtering and we is getting more identical frames with the same TSN to the host and i think TI is filtering it in the NCP firmware and its only getting one frame with the TSN.

Some very exentric devices (IKEA Styrbar and Symponisk with old firmware) is sending commands with broadcast and also unicast that is identical (same TSN) and is not so easy for the user filtering in HA and debunse in HA is making all slow and cant processing commands fast like multi singe clicks).

My suggestion is implanting frame filtering with the same TSN and command type in the radio lib for radios that need it or in Zigpy for all and we is getting many strange problems fixed also some that is coming in the future.

MattWestb commented 1 year ago

One more thing: I think some old not Zigbee PRO stack (ZLL and HA 1.X) is not using passive ack and is sending all broadcast frames 3 times to its neighbors (and is considered OK sent) and if its one ZB3 / PRO stack in the neighbor its only reseeding one copy of the 3 frame received if its being OK passive acked so its working OK with "mixed" system but is generating more broadcast frames in the mesh.