tube0013 / tube_gateways

Information and Documentation on Tube's Zigbee Gateways
270 stars 51 forks source link

z2m failures #138

Open tube0013 opened 1 year ago

tube0013 commented 1 year ago

Well I did update both esphome firmware and the controller firmware and now all my devices are not working. I have tried restarting the PoE controller and zigbee2mqtt a few times :/

Zigbee2MQTT:error 2023-02-27 19:56:08: Publish 'set' 'state' to 'landing__ceiling_light' failed: 'Error: Command 0x5c0272fffecdff50/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 24571 - 1 - 11 - 6 - 11 after 10000ms)'
Zigbee2MQTT:debug 2023-02-27 19:56:08: Error: Command 0x5c0272fffecdff50/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 24571 - 1 - 11 - 6 - 11 after 10000ms)
    at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
    at listOnTimeout (node:internal/timers:564:17)
    at processTimers (node:internal/timers:507:7)

_Originally posted by @onedr0p in https://github.com/tube0013/tube_gateways/issues/128#issuecomment-1447378067_

tube0013 commented 1 year ago

@onedr0p please use this new issue.

Can you confirm z2m is starting? The most recent log snippet would appear to be from a started z2m instance.

onedr0p commented 1 year ago

Yes, z2m is started but none of my devices are responding to actions in z2m

tube0013 commented 1 year ago

Need more information then.

Z2M starting rules out a connection issue with the coordinator.

onedr0p commented 1 year ago

Let me know what info you need.

onedr0p commented 1 year ago

Device: tubeszb-cc2652-poe Z Firmware: CC1352P2_CC2652P_launchpad_coordinator_20221226 ESP Firmware: https://github.com/tube0013/tube_gateways/blob/main/models/retired/tubeszb-cc2652-poe/firmware/esphome/tubeszb-cc2652-poe.bin z2m version: 1.30.1

tube0013 commented 1 year ago

Please run an energy scan to check for interference.

https://github.com/zigpy/zigpy-cli

The previous way to do this has sold docs on setting up on any platform.

https://github.com/zigpy/zigpy-znp/blob/dev/TOOLS.md

To run the scan room z2m, and it will be a command line like this

''' zigpy radio znp /dev/cu.usbserial-1420 energy-scan '''

onedr0p commented 1 year ago

I have my zigbee network on ZIGBEE channel 11 and wifi on WIFI channel 11, which according to this website is on the opposite sides of the spectrum.

Which makes since according this output.

zigpy radio znp socket://192.168.1.90:6638 energy-scan

Channel energy (mean of 5 / 5):
------------------------------------------------
 + Lower energy is better
 + Active Zigbee networks on a channel may still cause congestion
 + TX on 26 in North America may be with lower power due to regulations
 + Zigbee channels 15, 20, 25 fall between WiFi channels 1, 6, 11
 + Some Zigbee devices only join networks on channels 15, 20, and 25
 + Current channel is enclosed in [square brackets]
------------------------------------------------
 - [11 ]     5.65%  #####
 -  12      14.35%  ##############
 -  13      19.84%  ###################
 -  14      17.49%  #################
 -  15      43.61%  ###########################################
 -  16      14.20%  ##############
 -  17      14.67%  ##############
 -  18      22.67%  ######################
 -  19       8.55%  ########
 -  20      19.53%  ###################
 -  21      32.94%  ################################
 -  22      57.57%  #########################################################
 -  23      51.84%  ###################################################
 -  24      30.90%  ##############################
 -  25       0.00%
 -  26*     46.12%  ##############################################
onedr0p commented 1 year ago

It is worth noting that nothing else changed besides me updating the firmware on the device to try and avoid the problem of the tubezb controller locking up (loosing network) as reported in the other issue.

✖ /opt/homebrew/bin/python3.11 ./cc2538-bsl.py -p socket://192.168.1.90:6638 -evw ../CC1352P2_CC2652P_launchpad_coordinator_20221226.hex
Opening port socket://192.168.1.90:6638, baud 500000
Reading data from ../CC1352P2_CC2652P_launchpad_coordinator_20221226.hex
Your firmware looks like an Intel Hex file
Connecting to target...
CC1350 PG2.0 (7x7mm): 352KB Flash, 20KB SRAM, CCFG.BL_CONFIG at 0x00057FD8
Primary IEEE Address: 00:12:4B:00:21:B4:97:5C
    Performing mass erase
Erasing all main bank flash sectors
    Erase done
Writing 360448 bytes starting at address 0x00000000
Write 104 bytes at 0x00057F980
    Write done
Verifying by comparing CRC32 calculations.
    Verified (match: 0xa9dc145d)
tube0013 commented 1 year ago

fair enough thanks.

Did you have a network back up before you did the fw update? (it shold be auto create by z2m) if it wasn't there or valid, when z2m restarted after the fw update it would create a whole new network, which could explain why nothing is working.

onedr0p commented 1 year ago

Hm, not sure I am following. I have backups for at least 14 days, are you saying I should try and restore one?

tube0013 commented 1 year ago

Firmware updates erase the network settings off the coordinator.

Z2M after a successful start creates a backup within the config folder. if this was not there when z2m did start following the update it is possible it will have over wrote the previous network settings.

14 days of back ups I'm assuming you mean full HA backups. you could look in a recent one specifically for the *backup.json and compare the current setting so the ones in the backup.

you can get the current settings by running a back up with the zigpy-cli

onedr0p commented 1 year ago

It's odd that the behavior of devices appearing online and updating the last seen? But nothing gets updated on the devices state like open/close etc..

image
tube0013 commented 1 year ago

No.

something else must be going on. But Z2M would not show those devices as online if it wasn't able to talk to them. Is Z2M staying connected to the coordinator? ie not doing a lot of restarts?

onedr0p commented 1 year ago

There have been no restarts on z2m since 50 minutes ago when I started it.

tube0013 commented 1 year ago

Are you able to pair a device?

onedr0p commented 1 year ago

I have not tried and I will have to wait until tomorrow to attempt it.

onedr0p commented 1 year ago

In the meantime I tried downgrading to CC1352P2_CC2652P_launchpad_coordinator_20220219 and the behavior is still the same.

I am sorry to say but these issues appear to be all related

https://github.com/tube0013/tube_gateways/issues/128 https://github.com/tube0013/tube_gateways/issues/133

tube0013 commented 1 year ago

you can try rolling back the fw, last esphome release binary is still on github.

I don't believe your issue is related to #128 at unless z2m starts disconnecting from the coordinator.

onedr0p commented 1 year ago

Which bin on this page should I use to revert back (there's 2)?

https://github.com/tube0013/tube_gateways/tree/main/models/retired/tubeszb-cc2652-poe/firmware/esphome/Previous_Version

onedr0p commented 1 year ago

I don't believe your issue is related to https://github.com/tube0013/tube_gateways/issues/128 at unless z2m starts disconnecting from the coordinator.

I was having this issue before I touched updating any firmware...

https://github.com/tube0013/tube_gateways/issues/128#issuecomment-1423307398

tube0013 commented 1 year ago

tube-zb-gw-cc2652p2-poe_olimex_round_latest.bin

My experience has been better with the newest versions, but this is worth a shot.

I'm happy to troubleshoot, I have a 1st gen poe device in a test network. and it's working great there.

tube0013 commented 1 year ago

Happy to continue to troubleshoot. Also fell fee to reach out to me on email store @ tubeszb.com for replacement options, in cases like these I do offer good discounts.

onedr0p commented 1 year ago

Frankly I am not sure what is happening anymore, I downgraded both firmwares, restored a backup and nothing seems to work :/

tube0013 commented 1 year ago

I'd try to pair a device. if z2m is starting, it's able to talk to the coordinator. it won't start otherwise.

onedr0p commented 1 year ago

I tried to update the firmware one more time and z2m won't even start...

Using '/data' as data directory
Zigbee2MQTT:debug 2023-02-28 09:26:54: Loaded state from file /data/state.json
Zigbee2MQTT:info  2023-02-28 09:26:54: Logging to console only'
Zigbee2MQTT:info  2023-02-28 09:26:54: Starting Zigbee2MQTT version 1.30.1 (commit #eb878d3)
Zigbee2MQTT:info  2023-02-28 09:26:54: Starting zigbee-herdsman (0.14.89)
Zigbee2MQTT:debug 2023-02-28 09:26:54: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null,"disableLED":false},"backupPath":"/data/coordinator_backup.json","databaseBackupPath":"/data/database.db.backup","databasePath":"/data/database.db","network":{"channelList":[11],"extendedPanID":[221,221,221,221,221,221,221,221],"networkKey":"HIDDEN","panID":62840},"serialPort":{"path":"tcp://192.168.1.90:6638"}}'
Zigbee2MQTT:debug 2023-02-28 09:29:22: Saving state to file /data/state.json
Zigbee2MQTT:error 2023-02-28 09:29:22: Not connected to MQTT server!
Zigbee2MQTT:error 2023-02-28 09:29:22: Cannot send message: topic: 'zigbee2mqtt/bridge/state', payload: '{"state":"offline"}
Zigbee2MQTT:info  2023-02-28 09:29:22: Disconnecting from MQTT server
TypeError: Cannot read properties of undefined (reading 'end')
    at MQTT.disconnect (/app/lib/mqtt.ts:129:21)
    at Controller.stop (/app/lib/controller.ts:198:9)
    at stop (/app/index.js:111:5)
    at process.handleQuit (/app/index.js:117:9)
tube0013 commented 1 year ago

can you try and run an eneryg-scan again, but just let it run for a while.

this opens the connection to the coordinator and starts the network and is in constant reading/writing to the coordinator telling it to scan and reading the results. see if it runs for a while or if it disconnects/crashes.

onedr0p commented 1 year ago
❯ zigpy radio znp socket://192.168.1.90:6638 energy-scan
2023-02-28 10:06:04.552 Devins-MacBook-Pro.local asyncio WARNING socket.send() raised exception.
2023-02-28 10:06:05.553 Devins-MacBook-Pro.local asyncio WARNING socket.send() raised exception.
2023-02-28 10:06:06.555 Devins-MacBook-Pro.local asyncio WARNING socket.send() raised exception.
2023-02-28 10:06:07.556 Devins-MacBook-Pro.local asyncio WARNING socket.send() raised exception.
2023-02-28 10:06:08.558 Devins-MacBook-Pro.local asyncio WARNING socket.send() raised exception.
2023-02-28 10:06:09.093 Devins-MacBook-Pro.local zigpy.application ERROR Couldn't start application
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 665, in _skip_bootloader
    result = await responses.get()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy/application.py", line 145, in startup
    await self.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/zigbee/application.py", line 107, in connect
    await znp.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 707, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 664, in _skip_bootloader
    async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT):
  File "/opt/homebrew/lib/python3.11/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/opt/homebrew/lib/python3.11/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
TimeoutError
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 665, in _skip_bootloader
    result = await responses.get()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/zigpy", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_cli/cli.py", line 20, in inner
    return loop.run_until_complete(cmd(*args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_cli/radio.py", line 178, in energy_scan
    await app.startup()
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy/application.py", line 145, in startup
    await self.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/zigbee/application.py", line 107, in connect
    await znp.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 707, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/zigpy_znp/api.py", line 664, in _skip_bootloader
    async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT):
  File "/opt/homebrew/lib/python3.11/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/opt/homebrew/lib/python3.11/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
TimeoutError
tube0013 commented 1 year ago

this is with the rolled back ESPhome fw?

is the coordinator accessible over the web when this happens?

onedr0p commented 1 year ago

This is with the latest esphome FW, and yes I can access it via the web.

onedr0p commented 1 year ago

I shut down z2m and restarted the controller and now I can run the zigpy energy-scan command.

tube0013 commented 1 year ago

oh yeah that would explain that, only one thing can connect at a time.

as a test I'd let that run as long as you can reasonably let it. I've run them straight for multiple days to test things out but I'm just trying to get a baseline for if there are connection issues, or if this is more of a zigbee/z2m issue.

onedr0p commented 1 year ago

I thought zigpy radio znp socket://192.168.1.90:6638 energy-scan is just reporting stats on channels or is it actively doing something else?

tube0013 commented 1 year ago

so it's running a scan over and over again. the 1/5 - 5/5 are an average of the last 5 scans. so it's a telling the coordinator to scan and also receiving the results of the scan. and it does this over and over again.

So tests the connection with the coordinator and also the results test the RF of the radio, this is why I run this as a QC test so I can spot issues with the 2 main functions of the device. https://twitter.com/TubesZb/status/1619195503358668801?s=20

onedr0p commented 1 year ago

I had to stop it, but it ran for nearly 3 hours

tube0013 commented 1 year ago

that to me shows that the device components are working as they should and this may be more of a z2m issue. I am not ruling out possible disconnections with longer tests or z2m connections, but the 3 hour test is pretty good. With questionable hardware fw, I've had them stop after only a few minutes.

Sticking with the zigpy-cli tools, you can run a remote energy scan on any router on your mesh by specifying --nwk 0x1234 at the end of the command where 0x1234 is the nwk address of the router. this would test communication to a device on your mesh network.

onedr0p commented 1 year ago

I am pretty sure at this point I am going to have to nuke my zigbee network and re-pair my devices, it's not too much effort as I have ~20 devices but annoying all the same. It is a shame because your device lasted great for a year.

onedr0p commented 1 year ago

I've moved to another zigbee controller, you can close this as there isn't anything left to debug. The 2021 poe device must be faulty or something.

kashalls commented 10 months ago

@tube0013 I ran a similar setup to onedr0p with the same coordinator (he recommended it and I got it).

It looks like mine started failing this week. I tried to update ESPHome and the firmware, it runs for a little bit but then locks up completely.

None of the actions are functioning, you can still push an OTA Update and it comes back to life for a bit. RlEqCn