zigpy / zigpy-deconz

A library which communicates with Deconz radios for zigpy
GNU General Public License v3.0
86 stars 20 forks source link

"Failed to receive transmit confirm for request" #31

Closed stephenfsnow closed 5 years ago

stephenfsnow commented 5 years ago

Home Assistant 0.91.2 Raspbee firmware 262f0500

So I finally received my replacement Raspbee (first one had died) and added a mix of OSRAM Lightify RGBW bulbs, strips and recessed lights and a few Philips Hue products to a fresh install of Home Assistant. On startup I'm seeing a lot of repeated "Failed to receive transmit confirm for request id: XXX" errors mixed with a few others....

2019-04-08 23:06:01 ERROR (MainThread) [homeassistant.components.device_tracker] Unable to load /config/known_devices.yaml: Config file not found: /config/known_devices.yaml 2019-04-08 23:06:02 WARNING (MainThread) [zigpy_deconz.zigbee.application] Error while sending frame: 0xe9 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 1 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 2 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 3 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 4 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 5 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 6 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 7 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 8 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 9 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 10 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 11 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 12 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 13 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 14 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 15 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 16 2019-04-08 23:06:16 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 17

Response of the lights is very spotty. Usually after a few minutes, Home Assistant will no longer display the state of the light correctly i.e. a light that has been turned on will show toggled off. If I try to change the state of the light (on/off, brightness, color) it usually takes a few tries before it will respond. I've attached a log from just after startup and then what happens with I try to change light state... Any help or insight is appreciated!!!

home-assistant.txt

vinmar1 commented 5 years ago

Back again... after getting things running yesterday when I looked this morning I also had a stream of the same errors. Difference with @stephenfsnow is hardware mine is ConBee native ZHA not Raspbee.

Symptoms are the same, lights stop updating GUI and cannot be controlled. I didn't have debug running in the log but there are tracebacks in this one.

home-assistant no debug.log

I have rebooted the NUC and turned debug on in the log. The reboot also cleared the issue for the moment.

Adminiuga commented 5 years ago

@vinmar1 are you on docker/hassio? What hardware are you using? Just trying to get common points between with other reported cases. Also:

  1. when you start getting those failures, does this happen to all devices or just some
  2. if you go to zha panel, can you read attributes from the affected devices? for example for lights/switches read onoff attribute from on_off cluster (cluster id 0x0006)
  3. do you receive any updates from other sensors, like from temp/motion etc, after you start getting those failures?
vinmar1 commented 5 years ago

@Adminiuga I'm on Intel NUC / Ubuntu 18.04.2 / Docker 18.09.4 / Portainer 1.20.2 /Hassio 155 / HA 0.91.2 / Conbee USB firmware 262f0500 / Native ZHA not deCONZ

(Smartthings) Samjin Motion Sensors, the newest ones and Cree Bulbs

For this incident it was only 2 of 3 Cree bulbs that have had an issue.

For questions 2 and 3 I will need to check when it happens again. After the NUC reboot it has been running as it should again but I expect it to fail again. I'll also have a log with debug turned on, that is why I rebooted.

stephenfsnow commented 5 years ago

@Adminiuga I'm running Hass.io on Pi 3 Model B, 32bit install with RaspBee board.

Seem to be getting failures on all devices. Mix of Osram Lightify (U.S. ZHA version, not ZLL) bulbs/strips and Hue strips.

I will attempt to check tonight on attributes. Currently put everything back on my Lightify hub but I'll setup some tester bulbs. I did notice that when I go to the entities (both light and ZHA) the status will just say "unavailable" after a while. This happens on all devices.

Unfortunately I don't have anything other than lights on Zigbee.

vinmar1 commented 5 years ago

Only change I made was upgraded to 0.91.3.

After some time my Samjin's and Cree's went offline, I saw no errors.

zha.cree_connected_a_19_60w_equivalent unavailable friendly_name: Cree Bsmnt Stairs ZHA
zha.cree_connected_a_19_60w_equivalent_2 unavailable friendly_name: Cree Util East ZHA
zha.cree_connected_a_19_60w_equivalent_3 unavailable friendly_name: Cree Util West ZHA
zha.samjin_motion unavailable friendly_name: Bsmnt Stairs ZHA
zha.samjin_motion_2 unavailable friendly_name: Bsmnt Storage ZHA
zha.samjin_motion_3 unavailable friendly_name: Bsmnt Util ZHA

This happened between 10:23 to 10:28 according to the history. The log is below.

home-assistant.zip

A restart of just HA from the Configuration menu brought everything back.

I'm not seeing complaints from anyone using a HUSBZB-1 does anyone know if these issues are limited to just zigpy-deconz ?

vinmar1 commented 5 years ago

The errors are back (See below) and I have a log with debug turned on.

One Cree light shows as being on in the GUI though it isn't. This is light ieee - e2:0d:b9:ff:fe:09:9b:32 "Util West" From the GUI... If I toggle it off the toggle goes back to the on position on it's own but he light remains off If I toggle it off and back on before the GUI does the light does go on If I then toggle it off the light goes out but the GUI will move the toggle back to on, the light remains off

home-assistant.zip

This is a partial list of errors just streamed Failed to receive transmit confirm for request id: 181 3:39 PM components/zha/core/channels/init.py (WARNING) Failed to receive transmit confirm for request id: 180 3:39 PM components/zha/core/channels/init.py (WARNING) Failed to receive transmit confirm for request id: 179 3:39 PM components/zha/core/channels/init.py (WARNING) Failed to receive transmit confirm for request id: 178 3:38 PM components/zha/core/channels/init.py (WARNING) Failed to receive transmit confirm for request id: 177 3:38 PM components/zha/core/channels/init.py (WARNING) Error doing job: Task exception was never retrieved 3:38 PM /usr/local/lib/python3.7/site-packages/homeassistant/core.py (ERROR) - message first occured at 3:08 PM and shows up 13 times

Nemesis24 commented 5 years ago

I've got the same errors:

11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx, 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0748b0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}
11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
Failed to receive transmit confirm for request id: 48
11 avril 2019 à 13:50 components/zha/core/helpers.py (WARNING)
Failed to receive transmit confirm for request id: 47
11 avril 2019 à 13:50 components/zha/core/helpers.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx', 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0749f0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}
11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx', 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0749f0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}
11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx', 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0749f0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}
11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
Failed to receive transmit confirm for request id: 44
11 avril 2019 à 13:50 components/zha/core/helpers.py (WARNING)
Failed to receive transmit confirm for request id: 42
11 avril 2019 à 13:50 components/zha/core/helpers.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx', 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0749d0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}
11 avril 2019 à 13:50 components/zha/core/discovery.py (WARNING)
there are no channels in the discovery info: {'unique_id': 'xxxxx', 'zha_device': <homeassistant.components.zha.core.device.ZHADevice object at 0x6e0749d0>, 'channels': [], 'component': 'binary_sensor', 'sensor_type': 'opening'}

Hass.io on Pi 3 Model B, 32bit install with RaspBee board

Adminiuga commented 5 years ago

all with Failed to receive transmit confirm for request id can you this branch https://github.com/Adminiuga/zigpy-deconz/commits/fixes/tx-error-handling

vinmar1 commented 5 years ago

@Adminiuga I'm not sure what you are asking...

vinmar1 commented 5 years ago

Not sure where this stands...

I've been running 0.91.0 because it was the most stable. I have just upgraded to 0.91.4 and see a lot of changes have been made and a couple looked like they could help this issue. I have the log recording debug just in case and we'll see how it goes.

vinmar1 commented 5 years ago

Running on 0.91.4 for less than 24 hours, this morning all ZHA was offline same errors.

dmulcahey commented 5 years ago

I’m going to ask you to do something that seems a bit out there... but can you get a usb extension cable, and hook the stick up with it to distance it from the computer please? We were recently informed that this could fix transmission failures.

vinmar1 commented 5 years ago

I had seen past posts about radio conflict concerns and have been running extensions for some time now. I have both the Aeotec and ConBee distanced from each other and from the NUC. One of the reasons I am using the ConBee and Aeotec and not the HUSBZB-1 was I wanted separation of the devices as well as better future support. I have an OSRAM Lightify hub in the next room that has no issues and if I put the Cree lights on it they work flawlessly but it won't support the samjin motion sensors or I would move everything to it. I would love to see the ConBee work but I am tempted to just buy 3 Lightify Motion sensors and dump it. I think today I may stop all my docker containers and backup the images then do a scratch rebuild back to the HUSBZB-1 and see if all the issues go away.

I did a restart this morning and lost the log from last night but it is the same scenario each time, "Failed to recieve transmit..." errors start a couple here and there and finally start to stream until everything is offline and then they stop. There was one Cree still online when I caught it this morning.

vinmar1 commented 5 years ago

@dmulcahey I think you may be on to something...

I backed up my docker images nuked the current installation and did a scratch rebuild today. I pulled the ConBee and Aeotec off the NUC, went back to the HUSBZB-1. All is back up and running, much of the configuration and automations are from files I saved and copied back changing entity names where appropriate making it fairly easy.

Soooo..... what I noticed... some of my equipment that runs on 2.4GHz WiFi like my Yamaha Reciever, OSRAM Hub, D-link Camera which are in close proximity, 20 feet or less, to the NUC, which actually is on 5GHz were always showing network drops and reconnects both in HA and in my router log. I have an ASUS AiMesh setup and all of those are in close proximity one of the routers they should never drop. I have other equipment much further away that connects to another WiFi node that never seems to be a problem.

After pulling the ConBee and Aeotec I noticed things seem to have cleared up on my network. I have a suspicion you are on to something about radio interference but I really think the ConBee has a "Dirty" radio in it that is both creating and receiving interference. The ConBee was pulled about 5 or 6 hours ago and my router logs have been clear since then, not just coincidence in my book. I'm going to just plug it into a Pi when I get a change to confirm but I have a feeling it's a problem.

Some things I have found...

https://support.metageek.com/hc/en-us/articles/203845040-ZigBee-and-WiFi-Coexistence

https://github.com/dresden-elektronik/deconz-rest-plugin/issues/825

stephenfsnow commented 5 years ago

@vinmar1 have you tried using the DeConz component? The funny thing is that when I use DeConz, ZigBee network seems quite stable, but I'm also able to choose what channel is used. Since my WiFi network (Google WiFi) tends to choose channel 1, I always made sure DeConz was using a channel above 20. I'm wondering how the ZHA component chooses what channel to use and maybe something changed with channel selection since ZHA was mostly stable several HA versions ago.

The only reason I can't use the DeConz component is that it still doesn't recognize my U.S. spec Osram Lightify RGBW bulbs as having color temp settings so it tries to use RGB values instead for color temp and everything looks way off. The ZHA component correctly sends color temp values to these bulbs.

vinmar1 commented 5 years ago

I have not used deconz for 2 reasons... Most important it does not support my samjin motion sensors but I also didn't want to run and manage another application which is why I chose the native integration. I do have deconz installed on my windows pc and just checked and found the conbee is by default on channel 11 which is the worst possible channel to use so I changed it to 25 and on the conbee it gets saved as default. I just wish I had caught all of this before I rebuilt back to the HUSBZB-1. I do have all the images so I can re-deploy literally in minutes though I just need to backup the current images first. Right now all seems to be fairly stable and after weeks of playing with the conbee for one issue or another I need a break so that's for another day next week.

Are your issues gone you were having the same problems and errors?

stephenfsnow commented 5 years ago

@vinmar1 Well, since I received my replacement Raspbee, I haven't tried changing channels. I have an SD card with the Raspbian image from Dresden for updating firmware, but I honestly didn't think changing the channel through the Phoscon interface would save the setting to the board. I figured the ZHA component somehow chose the channel on it's own.

For now I have gone back to using the Osram Lightify hub with Lightify component. It works really well, but I still would like to get everything running on the RPi. Maybe I'll give the HUSBZB-1 a try. It seems going the Raspbee Zigbee + Aeotec Z-Wave route was not the best solution.

vinmar1 commented 5 years ago

I won't give up... personally I think a separate ZigBee and Z-Wave and using the native HA Interfaces is going to be the best way to go once it has time to mature which is why I went off in that direction. The HUSBZB-1 has had it's share of issues as well in the past but is slightly more mature at this point but the future is questionable. I don't know about the Raspbee but the ConBee has NVRAM that allows setting the network channel and holding it, I tried unplugging and plugging it in a few times to be sure. If you reset it you would need to change it again.

Looking at the pro and con the way I see them I will continue to chase having the Conbee and Aeotec. The HUSBZB-1 is temporary for a few days maybe a week because I need a break.

CON's 1: HUSBZB-1 Manufacturer support - none 2: HUSBZB-1 Dies - hits both ZigBee and Z-Wave 3: Raspbee / ConBee native integration - needs more development 4: Raspbee / ConBee using DeConz is another app to manage and not many devices supported other than switches and lights

PRO's 1: Raspbee / ConBee native integration is being worked on by great people like @dmulcahey and @Adminiuga and is better every time they touch it 2: Aeotec is pretty much rock solid from my experience 3: No additional app (DeConz) needed 4: Upgradability (Future proofing) and Compatibility (Most every device works) 5: Separation of ZigBee and Z-Wave hardware

vinmar1 commented 5 years ago

Okay I finally got back to this... I think my HUSBZB-1 went belly up but that is another story.

EDIT: Well all of this didn't last long... for whatever reason the Samjin motion sensors all stopped responding after about an hour. No error messages nothing at all to work with. The Lights would work but the GUI was a little out of sync at times, again no errors. I can find no evidence of WiFi interference my router log is clean as a whistle. I did a power off reset so HA came back on version 0.91.4. instead of the latest beta.

I did a complete rebuild back to the Aeotec and ConBee. The important change is I did get the ConBee on channel 25, default is 11 which is probably the worst channel it could be, I put my WiFi network on 7 so good separation now. All of that looks clean no more disconnects in the error logs on my router or HA losing touch with things.

ConBee on the same firmware etc. and the environment is the same except I went to the latest beta version...

On startup I still get a few errors but once it's up so far it is running clean. Here is the initial log if anyone wants to take a look.

home-assistant.zip

Error while sending frame: 0xe9 11:39 AM main.py (WARNING) - message first occured at 10:46 AM and shows up 2 times Exception when calling OWM web API to update forecast 11:37 AM components/openweathermap/weather.py (ERROR) Failed to receive transmit confirm for request id: 9 10:37 AM components/zha/core/helpers.py (WARNING) Failed to receive transmit confirm for request id: 7 10:37 AM components/zha/core/helpers.py (WARNING) Failed to receive transmit confirm for request id: 5 10:37 AM components/zha/core/helpers.py (WARNING) Failed to receive transmit confirm for request id: 1 10:37 AM components/zha/core/helpers.py (WARNING) Setup of platform octoprint is taking over 10 seconds. 10:36 AM main.py (WARNING) Unexpected transmit confirm for request id 9, Status: 0x00, 9 10:36 AM main.py (WARNING) Unexpected transmit confirm for request id 7, Status: 0x00, 7 10:36 AM main.py (WARNING) Unexpected transmit confirm for request id 5, Status: 0x00, 5

vinmar1 commented 5 years ago

@Adminiuga and @dmulcahey I think closing this ticket is in order. From what I can see the original issues have been cleared. I did a power down reset yesterday on version 0.91.4 and the system ran without issue after that. I have also ungraded to 0.92.0 with a power down reset and again the system appears to run just fine.

I think one important thing that came out of this is the Zigbee and WiFi channel overlap. I think a lot of future issues can be avoided if there was something written about it in the official doc's. I put this post up https://community.home-assistant.io/t/zigbee-conbee-and-wifi-interference/112945 which has already gotten a fair amount of attention but think something more formal may be appropriate. If you would like I can take a stab at putting something together just let me know.

dmulcahey commented 5 years ago

@vinmar1 go for it

stephenfsnow commented 5 years ago

Agreed, I'll get back to the ZHA component eventually. Sounds like signal interference is likely the culprit.

vinmar1 commented 5 years ago

I put this together... think it would work as an addition to the Zigbee section of the documentaion. Zigbee and WiFi Interference.pdf

damarco commented 5 years ago

@vinmar1 Can you maybe try HA 0.92 to verify if you still see those warnings during startup?

vinmar1 commented 5 years ago

I have been on 0.92 about 24 hours and all I get are the errors noted below at startup, the Octoprint and Lutron are expected of course. Since the upgrade the system has been very stable and the only related warning that appeared again once in that time was Error while sending frame: 0xe9. Aside from all the code changes I would say fixing the WiFi interference had a lot to do with disconnect issues. My router logs are absolutely clean the only thing in them in 24 hours was the check for firmware update it does daily not one disconnect from any WiFi devices which is why I put the doc posted above together, just don't know how to get it added to the HA Doc's for Zigbee Intyegration, I think it would be helpful to all.

This is the debug log from a fresh restart from this morning.

home-assistant.zip

2019-04-26 09:13:03 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for lutron_caseta_pro which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you do experience issues with Home Assistant. 2019-04-26 09:13:13 WARNING (MainThread) [zigpy_deconz.zigbee.application] Error while sending frame: 0xe9 2019-04-26 09:13:13 WARNING (MainThread) [zigpy_deconz.zigbee.application] Unexpected transmit confirm for request id 8, Status: 0x00, 8 2019-04-26 09:13:13 WARNING (MainThread) [zigpy_deconz.zigbee.application] Error while sending frame: 0xe9 2019-04-26 09:13:23 WARNING (MainThread) [homeassistant.components.sensor] Setup of platform octoprint is taking over 10 seconds. 2019-04-26 09:13:43 WARNING (MainThread) [zigpy_deconz.zigbee.application] Failed to receive transmit confirm for request id: 8

vinmar1 commented 5 years ago

Something I just noticed... if I do not restart from a power down state then Zigbee fails to work even though everything looks like it is online in the GUI. When performing a "Restart" from the configuration panel the Samjin motion sensors do nothing and the Cree lights will toggle but immediately go out of sync with the GUI. Starting from a power down state all seems to work just fine. I will open a separate ticket for this issue.

Adminiuga commented 5 years ago

One thing to keep in mind about HA UI, which I noticed for a few versions now, that on failed requests the status is not reverted back, so for example if the light is off (and not responsive) and you turn it on, in UI it stays ON for a few seconds, although the request failed. Only after a few seconds it goes back to original state, before the failed request.

vinmar1 commented 5 years ago

This is different and still happening at times but rarely, it does happen at same time motion sensors stop reponding.

UI toggle is off Light is off Push toggle to on and light goes on Toggle moves back to off position by itself and light stays on Move toggle to on and immediately to off and light goes off, toggle stays off

No errors show up

See ticket #37 ZHA Does not respond properly on "Restart" that I opened and go to my second post it's in that log file.

Adminiuga commented 5 years ago

just replied in https://github.com/zigpy/zigpy-deconz/issues/37#issuecomment-487241538 and it is consistent with what i'm seeing in the logs.

blah19 commented 5 years ago

I was getting the same types of errors and issues on 0.94.3. I have a very powerful WiFi router and had it set to channel 11; I didn't realize they used the same frequencies as 2.4 WiFi( thanks @vinmar1 ). I changed it to auto and it selected 1. My response has increased drastically and my error rate has gone way down.