Repair incorrect TCLK partner IEEE address on startup

puddly commented 11 months ago

This PR is a little experimental and relies on functionality present only in the latest build of EmberZNet (7.3.1.0). I haven't yet figured out which firmwares are affected or what conditions are needed to trigger this problem.

There seem to be circumstances where the Trust Center Link Key's partner IEEE address does not match the coordinator's. This causes new Zigbee 3.0 devices which request APS link keys to leave the network, since the coordinator will never respond :

It looks like the device's current EUI64 is not being correctly read by the firmware when we do not pass an EUI64 when creating the initial security state:

2023-08-25 14:12:55.930 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getEui64: [79:50:12:76:46:fb:10:f4]
2023-08-25 14:12:55.974 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getCurrentSecurityState: [<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.GLOBAL_LINK_KEY|HAVE_TRUST_CENTER_LINK_KEY|TRUST_CENTER_USES_HASHED_LINK_KEY|96: 244>, trustCenterLongAddress=00:12:4b:00:1c:a1:b8:46)]
2023-08-25 14:12:55.966 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received getKey: [<EmberStatus.SUCCESS: 0>, EmberKeyStruct(bitmask=<EmberKeyStructBitmask.KEY_HAS_OUTGOING_FRAME_COUNTER|KEY_HAS_PARTNER_EUI64|KEY_IS_AUTHORIZED: 26>, type=<EmberKeyType.TRUST_CENTER_LINK_KEY: 1>, key=b5:97:1c:d9:a8:e5:4c:8d:96:28:41:9d:83:1b:f7:6b, outgoingFrameCounter=0, incomingFrameCounter=0, sequenceNumber=0, partnerEUI64=00:12:4b:00:1c:a1:b8:46)]
2023-08-25 14:12:56.047 DEBUG (MainThread) [bellows.ezsp] NV3 restored EUI64: NV3KeyId.NVM3KEY_STACK_RESTORED_EUI64=79:50:12:76:46:fb:10:f4

The device's IEEE address is 79:50:12:76:46:fb:10:f4 but there seems to be a key table entry for 00:12:4b:00:1c:a1:b8:46, its original IEEE address (burned into USERDATA). Upon applying this fix, the device now joins as expected:

MattWestb commented 11 months ago

One great catch and fixing !!!

(MAN this man have learning using his brain !!) Silabs is having support for alternative / secondary TC (yes its one part of current Zigbee standard but all saying its not possible) but is not being used in many system and the active TC IEEE (or is it the original and the active is listening on) is one key for getting it working or messing the network up like what is happening in this case if not working well.

I think this was also the first problems we was having with the RCP / Zigbeed was having problems with Zigbee 3 end device that was trying updating the TC-Link key and was leaving but can being one other bug / undocumented future.

By the way Silab have fixing the problem with rejoining devices request there old NWK and the network is not like it and must requesting / is getting one new and we is getting it in the log. (I think the problem is not in the coordinator its in the Zigbee stack of the routers that is not liking the old MWK).

MattWestb commented 11 months ago

Only for info on my armbian64 RCP test system 3.4.1.0 with 20 IKEA controllers and little more devices:

Logger: bellows.zigbee.application
Source: runner.py:179
First occurred: 05:05:49 (2 occurrences)
Last logged: 05:05:49

NWK conflict is reported for 0xd855
Found 58:8e:81:ff:fe:f5:c4:a3 device for 0xd855 NWK conflict: _TZ3000_riwp3k79 TS0505A

and the device card have the device updating its NWK:

Device info
TS0505A
by _TZ3000_riwp3k79
Connected via Billy RCP 4.3.1 RK3318
Zigbee info
IEEE: 58:8e:81:ff:fe:f5:c4:a3
Nwk: 0x051d
Device Type: Router
LQI: 208
RSSI: -70
Last Seen: 2023-08-27T07:54:28
Power Source: Mains

Its one LIDL LED RGBWW light strip with MG21 module but the firmware is some year old so its not having the rejoining fix. I think i was wrong its the Zigbee stack in the device that think its one conflict after broadcasting its device accouterments and getting the replay from the network and is sending the information to the network for informing it that is changing its MWK (or its one device that is doing one false replay that is making the rejoiner changing its NWK).

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02% :tada:

Comparison is base (d5444cf) 99.77% compared to head (93e2510) 99.79%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## dev #577 +/- ## ========================================== + Coverage 99.77% 99.79% +0.02% ========================================== Files 67 68 +1 Lines 4855 4856 +1 ========================================== + Hits 4844 4846 +2 + Misses 11 10 -1 ``` | [Files Changed](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy) | Coverage Δ | | |---|---|---| | [bellows/ezsp/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL19faW5pdF9fLnB5) | `99.39% <100.00%> (+0.08%)` | :arrow_up: | | [bellows/ezsp/protocol.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3Byb3RvY29sLnB5) | `100.00% <100.00%> (+2.43%)` | :arrow_up: | | [bellows/ezsp/v10/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3YxMC9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | [bellows/ezsp/v4/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3Y0L19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [bellows/ezsp/v7/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3Y3L19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [bellows/ezsp/v8/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3Y4L19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [bellows/ezsp/v9/\_\_init\_\_.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy9lenNwL3Y5L19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [bellows/types/struct.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy90eXBlcy9zdHJ1Y3QucHk=) | `100.00% <100.00%> (ø)` | | | [bellows/zigbee/application.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy96aWdiZWUvYXBwbGljYXRpb24ucHk=) | `99.61% <100.00%> (-0.39%)` | :arrow_down: | | [bellows/zigbee/repairs.py](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy#diff-YmVsbG93cy96aWdiZWUvcmVwYWlycy5weQ==) | `100.00% <100.00%> (ø)` | | | ... and [1 more](https://app.codecov.io/gh/zigpy/bellows/pull/577?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zigpy) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

puddly commented 11 months ago

For now, this code requires the latest version of EmberZNet (with NV3 token support), as EmberZNet doesn't provide a way to write link key frame counters and I don't feel comfortable automatically restoring EmberZNet backups for older firmwares.

MattWestb commented 11 months ago

Is it also working with EZSP 6.10.X with NVM3 (I using it instead of the normal NCP and is adding the maximum 127 TC-Link key and extra GP functionality and is also working good for MG1X chips) or its it treated like the normal NCP firmware ?

puddly commented 11 months ago

@MattWestb I wasn't aware of any 6.10.x firmwares that enabled the token interface so I haven't tested but as long as NV3 support is enabled along with the token interface, it should work.

MattWestb commented 11 months ago

Its not so well dockumented bets is in Simplicity studio but i was founding little more in one GP paper: https://www.silabs.com/documents/public/user-guides/ug392-using-sl-green-power-with-ezp.pdf

My Billy is using this settings if you is interested: https://github.com/MattWestb/EFR32-FW/tree/main/Billy_EZSP#billy-ezsp The only problem is getting space for the NVM3 on the slash and not running out of RAM so must being little restricted with resources on MG1B and P chips.

And from the build file:

NCP UART Application with multi-rail library enabled for application specefic green power gpdf transmission scheduling.

This network coprocessor (NCP) application is an extension to the standard ncp-uart-hw sample application with following changes

Application configuration :

multi-rail library enabled instead of single rail (the purpose is : one handle used by Zigbee stack where as the other is used by application)
multirail-demo plugin enabled (this initialisaes the the additional rail handle)
GP library with sink and proxy table set to non zero.

This application implements a simple gp tx queue, the size of this queue can be configured by defining EMBER_APPL_GP_BIDIRECTIONAL_TX_QUEUE_SIZE.

The working of the queue is very simple, the host inititialises and submits the out going GPDF packets against GPD address, the queue holds it. When a GPDF is received (checked in emberPacketHandoffIncoming for MAC data type frames) from a GPD with rxAfterTx bit set (in its Gp NWK Ext FC) the queue is read and a transmission is scheduled using the additional RAIL handle for the rx offset time in application GP_RX_OFFSET_USEC (i.e 20000 micro seconds).

This application implements following custom EZSP commands as the queue interface

EMBER_CUSTOM_EZSP_COMMAND_INIT_APP_GP_TX_QUEUE : Initialise and clears the application specefic GP outgoing tx queue. EMBER_CUSTOM_EZSP_COMMAND_SET_APP_GP_TX_QUEUE : Sets (adds or overwrites) a GPDF frame in the queue for a given GPD. EMBER_CUSTOM_EZSP_COMMAND_GET_APP_GP_TX_QUEUE : Gets (reads back) the content from the queue for a GPD.

A test api for sending raw command out using the additioanl RAIL handle. EMBER_CUSTOM_EZSP_COMMAND_SEND_APP_GP_RAW : to send a raw GP packet on a specefic channel and time.

I shall testing it later but im little busy some days ahead but shall making it after that.

PS Some toya / LIDL ZBGW have start testing one version of EZSP 6.10.7.0 and its looks working OK but i dont knowing how many its using it.

Edit: Ops was forgetting: Great work done !!!!!!!

MattWestb commented 10 months ago

I have sent you one mail with ZHA logs and zigbee DB if you like looking after updating to HA 2023.09 but i dont have any problems in my production system that is running NCP EZSP 6.10.7. with NVM3 token storage.

zigpy / bellows

Repair incorrect TCLK partner IEEE address on startup #577

One great catch and fixing !!!

Codecov Report