zigpy / bellows

A Python 3 project to implement EZSP for EmberZNet devices
GNU General Public License v3.0
186 stars 88 forks source link

eachen ewelink will not reconnect after power cycle. #440

Open R5fan opened 3 years ago

R5fan commented 3 years ago

Im unsure where to report this, as I dont know is causing it.

I have an eachen ewelink zigbee hub (ethernet) flashed with tasmota (tried 9.5.0 and 9.5.0.3) and ncp-uart-nsw_6.7.9_115200 zigbee firmware (edit: tried ncp-uart-sw_6.7.8_115200 and the problem is the same). It works fine for days until I power cycle the hub. After powering it up again, I can not get ZHA to reconnect. Tasmota itself seems to be working, I can access the web interface and console and see no problems. But ZHA will not start:

Couldn't start EZSP = Silicon Labs EmberZNet protocol: Elelabs, HUSBZB-1, Telegesis coordinator
10:20:37 AM – (ERROR) Zigbee Home Automation - message first occurred at 10:19:41 AM and shows up 4 times
Couldn't start application
10:20:37 AM – (ERROR) /usr/local/lib/python3.9/site-packages/zigpy/application.py - message first occurred at 10:19:41 AM and shows up 4 times
NCP entered failed state. Requesting APP controller restart
10:20:32 AM – (ERROR) /usr/local/lib/python3.9/site-packages/bellows/ezsp/__init__.py - message first occurred at 10:20:07 AM and shows up 2 times
Lost serial connection: read failed: socket disconnected
10:20:32 AM – (ERROR) /usr/local/lib/python3.9/site-packages/bellows/uart.py - message first occurred at 10:20:07 AM and shows up 2 times

Restarting HA core, host, or tasmota doesnt help. Even deleting and reinstalling ZHA integration doesnt help. When I run tasmota 9.5.0.3 I can see in the tasmota console that a TCP connection is made from my HA host, but ZHA says it cant connect. (In 9.5.0 I never see such messages, even if everything is working.)

The only thing that I found that does help, is changing my tasmota template back to the one that supports zigbee2tasmota instead of serial over TCP, and then back to the one that does support ZHA via serial. Then I can reconnect in HA?

@MattWestb suggested I run bellows -d socket://192.168.x.x:8888 info in a console, but I wouldnt know what console to run that in (tried HA prompt and SSH, but it doesnt work).

MattWestb commented 3 years ago

Use the https://github.com/zigpy/zha-device-handlers#testing-quirks-in-development-in-docker-based-install for getting one console in the HA docker container if you is using one type of docker installation of HA.

One PM if ZHA is up and running OK you cant using the comport then its being used of bellows but for the moment is not problem then ZHA have not loading it.

My feeling is that is that tasmota for ESP32 is not saving all things OK and is loosing some setting on reboot / power reset. I have not testing the last tasmota for ESP8266 but it was working OK on older versions for my (before converting to ESPHome).

R5fan commented 3 years ago

This is what I get when running that command when ZHA is working:

bash-5.1# bellows -d socket://192.168.1.175:8888 info
[84:71:27:ff:fe:8d:a9:a0]
[0x0000]
[<EmberNetworkStatus.JOINED_NETWORK: 2>]
[<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:d0:7e:b9:94, panId=0x3994, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)]
[<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.64|32|HAVE_TRUST_CENTER_LINK_KEY|8|GLOBAL_LINK_KEY: 124>, trustCenterLongAddress=84:71:27:ff:fe:8d:a9:a0)]
Exception in callback SerialTransport._read_ready()
handle: <Handle SerialTransport._read_ready()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.9/site-packages/serial_asyncio/__init__.py", line 119, in _read_ready
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/bellows/uart.py", line 73, in data_received
    self.frame_received(frame)
  File "/usr/local/lib/python3.9/site-packages/bellows/uart.py", line 100, in frame_received
    self.data_frame_received(data)
  File "/usr/local/lib/python3.9/site-packages/bellows/uart.py", line 120, in data_frame_received
    self._handle_ack(data[0])
  File "/usr/local/lib/python3.9/site-packages/bellows/uart.py", line 245, in _handle_ack
    pending[1].set_result(True)
asyncio.exceptions.InvalidStateError: invalid state
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/bellows", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/util.py", line 39, in inner
    loop.run_until_complete(f(*args, **kwargs))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/ncp.py", line 84, in info
    brd_manuf, brd_name, version = await s.get_board_info()
  File "/usr/local/lib/python3.9/site-packages/bellows/ezsp/__init__.py", line 266, in get_board_info
    (result,) = await self.getMfgToken(token)
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
^CException ignored in: <module 'threading' from '/usr/local/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 1435, in _shutdown
    atexit_call()
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/local/lib/python3.9/threading.py", line 1053, in join
    self._wait_for_tstate_lock()
  File "/usr/local/lib/python3.9/threading.py", line 1069, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt: 
bash-5.1# bellows -d socket://192.168.1.175:8888 info
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/bellows", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/util.py", line 39, in inner
    loop.run_until_complete(f(*args, **kwargs))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/ncp.py", line 69, in info
    s = await util.setup(ctx.obj["device"], ctx.obj["baudrate"])
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/util.py", line 119, in setup
    await s.version()
  File "/usr/local/lib/python3.9/site-packages/bellows/ezsp/__init__.py", line 118, in version
    await self._command("version", ver)
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
warning: Reset future is None
R5fan commented 3 years ago

Just realized I might have to disable ZHA, so when I disable ZHA and run that command I get: bash-5.1# bellows -d socket://192.168.1.175:8888 info [84:71:27:ff:fe:8d:a9:a0] [0x0000] [<EmberNetworkStatus.JOINED_NETWORK: 2>] [<EmberStatus.SUCCESS: 0>, <EmberNodeType.COORDINATOR: 1>, EmberNetworkParameters(extendedPanId=cc:cc:cc:cc:d0:7e:b9:94, panId=0x3994, radioTxPower=20, radioChannel=11, joinMethod=<EmberJoinMethod.USE_MAC_ASSOCIATION: 0>, nwkManagerId=0x0000, nwkUpdateId=0, channels=<Channels.ALL_CHANNELS: 134215680>)] [<EmberStatus.SUCCESS: 0>, EmberCurrentSecurityState(bitmask=<EmberCurrentSecurityBitmask.64|32|HAVE_TRUST_CENTER_LINK_KEY|8|GLOBAL_LINK_KEY: 124>, trustCenterLongAddress=84:71:27:ff:fe:8d:a9:a0)] Manufacturer: Board name: EmberZNet version: 6.7.8.0 build 373

MattWestb commented 3 years ago

Yes that is best or ZHA is trying talking to the coordinator thru bellows and messing things up.

Then re-power the eachen and do the same test.

MattWestb commented 3 years ago

Then you have it working you shall updating the firmware in the Zigbee module then the EmberZNet version: 6.7.8.0 build 373 if is not from the factors is missing some critical setting that is needed and implanted in the 6.7.9.0 for tuning the Zigbee network OK but it shall not being any problem with the communication with ZHA.

R5fan commented 3 years ago

I tried both 6.7.8 and 6.7.9, the issue is the same.

MattWestb commented 3 years ago

Both shall working OK with the communicating with ZHA and is not one part of your issue here but in production is only the 6.7.9.0 OK then communicating with the Zigbee network.

Back to testing re-power tasmota and if its working or not ;-)

R5fan commented 3 years ago

After power cycling tasmota (and after disabling ZHA) :

bash-5.1#  bellows -d socket://192.168.1.175:8888 info
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/bellows", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/util.py", line 39, in inner
    loop.run_until_complete(f(*args, **kwargs))
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/ncp.py", line 69, in info
    s = await util.setup(ctx.obj["device"], ctx.obj["baudrate"])
  File "/usr/local/lib/python3.9/site-packages/bellows/cli/util.py", line 118, in setup
    await s.reset()
  File "/usr/local/lib/python3.9/site-packages/bellows/ezsp/__init__.py", line 98, in reset
    await self._gw.reset()
  File "/usr/local/lib/python3.9/site-packages/bellows/uart.py", line 223, in reset
    return await asyncio.wait_for(self._reset_future, timeout=RESET_TIMEOUT)
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

I also double checked tasmota is configured to listen on port 8888 and even manually restarted it:

13:04:56.302 CMD: TCPstart 8888
13:04:56.305 TCP: Stopping TCP server
13:04:56.309 TCP: Starting TCP server on port 8888
13:04:56.315 MQT: stat/tasmota_D09F78/RESULT = {"TCPStart":"Done"}

Tried it both on wifi and ethernet, same result.

MattWestb commented 3 years ago

And then you is changing to ZBB and back to ZHA profile its working ?

Its looks like tasmota on ESP32 is not doing all things OK after one cold start of the board.

R5fan commented 3 years ago

Yes indeed. I use this to go back to Z2T: backlog Template {"NAME":"ZB-GW03-V1.2","GPIO":[0,0,3552,0,3584,0,0,0,5793,5792,320,544,5536,0,5600,0,0,0,0,5568,0,0,0,0,0,0,0,0,608,640,32,0,0,0,0,0],"FLAG":0,"BASE":1} ; module 0

Did that the first time to see if the device was still working. Then

backlog template {"NAME":"ZHA-bridge","GPIO":[0,0,5472,0,5504,0,0,0,5793,5792,320,544,5536,0,5600,0,0,0,0,5568,0,0,0,0,0,0,0,0,608,640,32,0,0,0,0,0],"FLAG":0,"BASE":1} ; module 0 And as by magic, ZHA connects. Until I powercycle.

I just tried a full reset of tasmota, configured ZHA template, enabled TCP, works. Power cycled, it no longer works. Tried the ZHA template command again, doesnt help. Only going back to Z2T template "solves" it

I should probably just try esphome I guess?

MattWestb commented 3 years ago

You can open one issue or discussion in the tasmota git. I think its somthing untested with ESP32 in tasmota but the devs is also having ESP32 coordinator (from tube0013) so shall being possible to verifying if its working OK or not.

I recommending trying https://github.com/tube0013/tube_gateways/tree/main/esphome then its known working well and you is getting auto detection in ZHA with it :-)) Only checking if the GIPOs is the same used or you need changing that for getting it working on your ESP32 board.

MattWestb commented 3 years ago

@jon6fingrs do have looking on getting ESPHome working with this hardware ? tube0013 is using it on his ESP32 network coordinators and i is using it in my test network with WeMos D1 Mini (with one "IKEA Billy EZSP module and one LIDL LED controller with custom EZSP) and its working OK.

If possible doing one "fast guide" with the needed setting for the hardware can being good so the user can getting the hardware up and running.

Its looks like tasmota is having much work on its new v10.0.0 Norman and dont have time looking on the issue in there git.

Sad that the hardware looks good and user is having problems that is making its system having problem then the device is re powered ;-((

jon6fingrs commented 3 years ago

Hey I really don't have any experience with ESPHome but seems like a cool thing to look into. Will see if I can get it working when I have a minute and report back.

MattWestb commented 3 years ago

Great !!

The problem is the Ethernet hardware config for the ESP32 and the rest is easy done only "normal" ESPHome things and also easy getting connection status of the serial server in HA.

If you getting the time it being great but is not some brave user can trying it out and posting the setting for getting it working OK so users can implanting it easy :-))

Hedda commented 3 years ago

You can open one issue or discussion in the tasmota git.

@R5fan FYI; there is a general Tasmota / Z2T discussion started about ZB-GW03 hardware here -> https://github.com/arendst/Tasmota/discussions/12764

Note ZB-GW03 v1.0 and ZB-GW03-V1.2 is sold rebranded under many names, including SmartWise and EACHEN brands.

PS: There is also a general zigpy discussion about the ZB-GW03 eWeLink Ethernet Zigbee Gateway hardware here:

https://github.com/zigpy/zigpy/discussions/819

Hedda commented 3 years ago

@jon6fingrs FYI, there is a related feature request for ESPHome firmware here which include mentioning of ZB-GW03 too:

https://github.com/esphome/feature-requests/issues/688

thegroove made project for ESPHome for Sonoff Zigbee Bridge which is somewhat similar (ESP8266 instead of ESP32 based):

https://github.com/thegroove/esphome-zbbridge

If you are interested in taking on such a project then please also consider checking out request on Zeroconf DNS TXT records:

https://github.com/esphome/feature-requests/issues/1227

and

https://github.com/esphome/feature-requests/issues/1227

thegroove started on that project here:

https://github.com/thegroove/esphome-zeroconf

and continued here:

https://github.com/thegroove/esphome-zha-ezsp-zeroconf

PS: Again, there is also a general zigpy discussion about the ZB-GW03 eWeLink Ethernet Zigbee Gateway hardware here:

https://github.com/zigpy/zigpy/discussions/819

Hedda commented 2 years ago

FYI, xsp1989 also have some newer EmberZNet / EZSP firmware version here that you could try:

https://github.com/xsp1989/zigbeeFirmware/tree/master/firmware/ZigbeeBridge_SM-011

Looks like ncp-uart-nsw-6.8.2.0_115200.ota is currently the latest he built for SM-011 it uses?

Hedda commented 2 years ago

Forgot to also mention that syssi has now prepared a small project to provide a working esphome configuration for the ZB-GW03:

https://github.com/syssi/esphome-zb-gw03

MattWestb commented 2 years ago

Nop unsigned is ncp-uart-sw_v6.10.3_115200.gbl and signed is the recommended ncp-uart-nsw_6.7.9_115200.ota and the 6.8.X is one buggy one so i cant recommend it.