zigpy / open-coordinator-backup

Open Zigbee coordinator backup format
MIT License
56 stars 7 forks source link

Allow the `nwk_address` field to be `null` #8

Closed puddly closed 2 years ago

puddly commented 2 years ago

EZSP coordinators don't really expose the NWK addresses of devices to the application so their backups are not guaranteed to have this information.

Surprisingly, it seems like we do not actually need to store them either. I randomized all of the NWK addresses in a Z-Stack coordinator backup and after restoring it to my device, my network behavior did not change at all. I've been taking hourly coordinator backups for a while now and none of the NWK addresses in the backup have updated in over 24 hours now, even after a few coordinator resets. It appears that these NWK addresses are written only when a device joins the network and are not actually used at runtime.

We still need to provide dummy values for devices without known NWK addresses during restore but since Z-Stack is the only firmware that actually allows you to set them (neither EZSP or the Conbee firmware do), I suspect we may also be able to completely omit these keys as well (preserving backup(state) == backup(restore(backup(state)))).

Thoughts?

castorw commented 2 years ago

Well I am using these addresses to recover address manager entries (https://github.com/Koenkk/zigbee-herdsman/blob/aaa769fd878ce9b4b979ac66953342d28fafbc0c/src/adapter/z-stack/adapter/adapter-backup.ts#L252).

I think we may be able to make this field optional (why remove it when it may be available for some adapters) but I would rather do so only after I am sure it does not negatively affect network. If we can confirm that randomising the keys in Z-Stack wouldn't lead to any issues related to connectivity or APS we can do so.

puddly commented 2 years ago

So when you tried to randomise these entries it had no effect on the network at all?

Correct. I'm currently testing replacing all NWK addresses with 0xFFFE, the invalid node address, but the results are the same.

Did you try wiping the adapter NV before restoring the modified NWK addresses?

Yes, I performed a backup, edited it, erased all NVRAM entries and unplugged the coordinator, and then restored the backup.

What does the sniff say (what addresses are present in frames)?

I wasn't able to see any background unicast traffic leaving the coordinator, just periodic many-to-one route broadcasts to 0xFFFC (coordinator and all routers) and empty link status broadcasts.

The application (i.e. zigpy or herdsman) manages NWK addresses independently so the coordinator will send the data to the provided address and similarly pass through received packets.

The only way I could get the network to "break" is by enabling the APS encryption flag for all packets and then changing the NWK addresses directly in the NVRAM dump (to eliminate any effects from zigpy-znp's backup/restore code). Z-Stack will refuse to send a single packet to the affected devices, even if IEEE addressing is used (???), and immediately respond with APS_NOT_AUTHENTICATED. The same behavior is seen if an unmodified Z-Stack is started with the routers powered off.

The moment the routers are power cycled and broadcast their NWK address, their entries in NVRAM are updated and the network begins working again. This is an artificial scenario: the security flag is not currently used because of incompatibility with some end devices.

Adminiuga commented 2 years ago

The only way I could get the network to "break" is by enabling the APS encryption flag for all packets and then changing the NWK addresses directly in the NVRAM dump

If you don't change the nwk and enable aps encryption, do you get this error?

puddly commented 2 years ago

If you don't change the nwk and enable aps encryption, do you get this error?

Only if the routers were not powered on when the coordinator started up. Otherwise, APS encryption is used and I can't decrypt traffic without importing the hashed keys into Wireshark. When routers are powered off at runtime, the expected APS/MAC NO_ACK errors are generated and packets are sent.

I believe we need to read the stack revision or something when devices join in order to selectively enable this flag (if it's even useful). I tried forming a network long ago and Aqara along with one other brand just would not initialize as long as all outgoing packets had encryption enabled.

castorw commented 2 years ago

Okay, so we may have coordinators which do not expose the network address so we are unable to back them up, so there is nothing we can do about that. If the network addresses are available they should be backed up, if they are not they won't and null should be used instead - please let the README state this explicitly.

I will add compatibility allowing for restore of devices without known network addresses.