home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.32k stars 30.63k forks source link

IKEA hub looses connection #40612

Closed fribse closed 2 years ago

fribse commented 4 years ago

The problem

HA looses connectin to the IKEA hub quite fast. The IKEA system itself is working, but after less than a day, HA can't talk to it. if I remove/add the integration, it works again. Sometimes it helps just rebooting the HA. I only have my blinds on the IKEA hub, as that doesn't work well in deconz, all my lother zigbee stuff is in deconz/phoscon.

Environment

Problem-relevant configuration.yaml

Traceback/Error logs

Additional information

darksid3r commented 4 years ago

Same here, Zeroconf errors continue to appear after reactivating mDNS reflector on USG and Ikea Integration stops working. Here is the actual error: zeroconf.BadTypeInNameException: Type '\&V.' must end with '._tcp.local.' or '._udp.local.' 2020-10-18 01:41:24 ERROR (zeroconf-ServiceBrowser [entities removed]) [homeassistant.components.zeroconf] Failed to get info for device Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/components/zeroconf/init.py", line 244, in service_update service_info = zeroconf.get_service_info(service_type, name) File "/usr/local/lib/python3.8/site-packages/zeroconf/init.py", line 2423, in get_serviceinfo info = ServiceInfo(type, name) File "/usr/local/lib/python3.8/site-packages/zeroconf/init.py", line 1773, in init if not type_.endswith(service_type_name(name, allow_underscores=True)): File "/usr/local/lib/python3.8/site-packages/zeroconf/init.py", line 273, in service_type_name raise BadTypeInNameException("Type '%s' must end with '._tcp.local.' or '.udp.local.'" % type)

realjax commented 4 years ago

On the upside, it is working fine on my end now for over 36 hrs. ( HassOS 4.13 on rasp 4 )

(I see no zeroconf errors anywhere, but I must also note that if these errors show up in the core part of the logging, they might be overwhelmed by the dozens and dozens of MQTT debug messages that appear despite my logging cofiguration, but that is a different issue..)

fribse commented 4 years ago

I just saw another mDNS error:

It finds some weird services.

Logger: netdisco.mdns
Source: /usr/local/lib/python3.8/site-packages/netdisco/mdns.py:55
First occurred: 19. oktober 2020 12.15.09 (15 occurrences)
Last logged: 07.26.31

    Failed to add service �%V.
    Failed to add service
    Failed to add service �.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/netdisco/mdns.py", line 53, in _service_update
    service.add_service(zeroconf, service_type, name)
  File "/usr/local/lib/python3.8/site-packages/netdisco/discoverables/__init__.py", line 109, in add_service
    service = zconf.get_service_info(typ, name)
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 2423, in get_service_info
    info = ServiceInfo(type_, name)
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 1773, in __init__
    if not type_.endswith(service_type_name(name, allow_underscores=True)):
  File "/usr/local/lib/python3.8/site-packages/zeroconf/__init__.py", line 273, in service_type_name
    raise BadTypeInNameException("Type '%s' must end with '._tcp.local.' or '._udp.local.'" % type_)
zeroconf.BadTypeInNameException: Type '�%V.' must end with '._tcp.local.' or '._udp.local.'
realjax commented 4 years ago

I just saw another mDNS error:

It finds some weird services.


Logger: netdisco.mdns
Source: /usr/local/lib/python3.8/site-packages/netdisco/mdns.py:55

Sounds there's a party going on there and we weren't invited 😄

( sorry couldnt resist)

darksid3r commented 4 years ago
Failed to add service �%V.

zeroconf.BadTypeInNameException: Type '�%V.' must end with '._tcp.local.' or '._udp.local.'

That's exactly the same one I'm getting. I'm starting to wonder if if that is some wierd entity coming from the USG itself...

walllle commented 4 years ago

Turning off mDNS solved this problem for me. Been running for 3 days now without loosing connection.

tpihl commented 4 years ago

I have gone back to loosing the IKEA gateway and i have mDNS off.

realjax commented 4 years ago

Bummer. It's still working fine for me after 100+ hours...

tpihl commented 4 years ago

Will try to move to a usb zigbee stick

fribse commented 4 years ago
  - alias: 'tradfri_keep_alive'
    trigger:
    - minutes: /1
      platform: time_pattern
    action:
    - service_template: light.turn_{{ states('light.tradfri_panel') }}
      entity_id: light.tradfri_panel

This just gives me errors in the log:

Logger: homeassistant.components.automation.ikea_workaround
Source: core.py:1285
Integration: Automatisering (documentation, issues)
First occurred: 10.20.00 (13 occurrences)
Last logged: 10.32.00
While executing automation automation.ikea_workaround

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 426, in async_trigger
    await self.action_script.async_run(
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 985, in async_run
    await asyncio.shield(run.async_run())
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 239, in async_run
    await self._async_step(log_exceptions=False)
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 247, in _async_step
    await getattr(
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 454, in _async_call_service_step
    await service_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 1285, in async_call
    raise ServiceNotFound(domain, service) from None
homeassistant.exceptions.ServiceNotFound: Unable to find service light/turn_unavailable

Logger: homeassistant.components.automation.tradfri_keep_alive
Source: helpers/script.py:1097
Integration: Automatisering (documentation, issues)
First occurred: 10.35.40 (1 occurrences)
Last logged: 10.35.40
tradfri_keep_alive: Error executing script. Service not found for call_service at pos 1: Unable to find service light/turn_unavailable 

Are you on the latest version of HA???

fribse commented 4 years ago

Turning off mDNS solved this problem for me. Been running for 3 days now without loosing connection.

My problem with this, is that I will loose all the chromecasts if I do that (and other stuff).

realjax commented 4 years ago

Where exactly does one 'turn off' mDNS ?

morberg commented 4 years ago

Are you still seeing the problem in HA 0.117? It sounds like #41778 might fix this problem.

fribse commented 4 years ago

Yes, still see it, I've tried the automation, but it doesn't work on the latest version as far as I can tell.

ggravlingen commented 4 years ago

@fribse did you replace tradfri_panel in here with the name of one of your lights {{ states('light.tradfri_panel') }}?

fribse commented 4 years ago

Yes :-) I actually added just one light to the IKEA hub, just for this. I also entered the automation directly as yaml, just to be sure.

fribse commented 4 years ago

I just found out that the light had lost connection to the IKEA hub, I've reconnected it, so now I'll test it out...

fribse commented 4 years ago

This morning I see this in the log (the keep-alive automation is still working):

Logger: homeassistant.components.tradfri.base_class
Source: components/tradfri/base_class.py:24
Integration: IKEA TRÅDFRI (documentation, issues)
First occurred: 03.20.00 (14 occurrences)
Last logged: 03.32.00

    Unable to execute command <Command put ['15001', 65555]: {'3311': [{'5850': 0}]}>: {"r":"02"}
    Unable to execute command <Command get ['15001', 65553]>: {"r":"01"}
    Unable to execute command <Command put ['15001', 65555]: {'3311': [{'5850': 0}]}>: {"r":"07"}
Logger: homeassistant.components.tradfri.base_class
Source: /usr/src/homeassistant/homeassistant/components/tradfri/base_class.py:52
Integration: IKEA TRÅDFRI (documentation, issues)
First occurred: 03.20.58 (1 occurrences)
Last logged: 03.20.58
Observation failed for Lille Rullegardin i stuen

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/aiocoap/protocol.py", line 888, in _run_observation
    weak_observation().callback(full_notification)
  File "/usr/local/lib/python3.8/site-packages/aiocoap/protocol.py", line 1034, in callback
    c(response)
  File "/usr/local/lib/python3.8/site-packages/pytradfri/api/aiocoap_api.py", line 186, in success_callback
    api_command.result = _process_output(res)
  File "/usr/local/lib/python3.8/site-packages/pytradfri/api/aiocoap_api.py", line 252, in _process_output
    raise ClientError(output)
pytradfri.error.ClientError: {"r":"01"}
fribse commented 3 years ago

I've tried disabling zeroconf for my setup, I still loose connection to the IKEA hub, and also with the 'work around' automation that activates every minute.

realjax commented 3 years ago

How's the ikea app itself? Does that maintain a good connection?

iamthew4lrus789 commented 3 years ago

Where exactly does one 'turn off' mDNS ?

Was wondering this too.

Seems much more stable on current version - been running for approx one week (without the keep-alive automation) and it just stopped this morning. Clearly not solved yet, but significantly better.

realjax commented 3 years ago

I have had no more problems in the last few weeks. But next to restarting HA every night I also reboot my Ikea hub every night. Maybe that makes a difference too.

C6H6 commented 3 years ago

Any update on this?

tpihl commented 3 years ago

I’ve left Ikea hub not looking back. For me, when the hub became unavailable, it didn’t matter if I tried to use Ikea app, only resolution was reboot hub.

fribse commented 3 years ago

The problem is not entirely IKEA, as the thread started, this SEVERE problem was introduced in 0.115, and is left unfixed, which is terrible, somebody needs to investigate this in the HA code. My IKEA things works perfectly internally, and Google assistant can manage them throughout all the problems, so it is NOT the IKEA gateway (for once) causing problems, it's the integration. Patrik, @ggravlingen could you try to activate some people on this? I mean, it's so bad, that even having an automation sending 'toggle' once a minut to a led controller (without lights), doesn't fix it, a nightly reboot (of both HA and hub) doesn't fix it. I only have my IKEA blinds left on the hub, so there is not much traffic going to and from it, and this could very well be the cause. All the zigbee bulbs are on deconz, but deconz and the IKEA blinds doesn't play well with each other.

litinoveweedle commented 3 years ago

Hello,

in another issue I did my best to troubleshoot similar problem and I think I found at least one issue, when after network issue causing coap protocol reset credentials are not reinstated.

Please take a look and try my solution. If possible reply to given thread. ;-)

https://github.com/home-assistant/core/issues/42563#issuecomment-735455219

fribse commented 3 years ago

Hi @litinoveweedle I would love to take a look, and try that, so what access should I use to get to that file? I have console access (as it's installed via PVE), and I can get a console with login as root, and then write login, but then I'm lost. I can't see the file via the samba protocol, and the Visual Studio Code editor, seems also to be limited from accessing it?

litinoveweedle commented 3 years ago

Hi @litinoveweedle I would love to take a look, and try that, so what access should I use to get to that file? I have console access (as it's installed via PVE), and I can get a console with login as root, and then write login, but then I'm lost. I can't see the file via the samba protocol, and the Visual Studio Code editor, seems also to be limited from accessing it?

I would suggest to take look here: https://github.com/home-assistant/core/issues/42563#issuecomment-735452973

fribse commented 3 years ago

Hi @litinoveweedle Thanks, I couldn't get access, it asks for credentials when I try to do login, but I don't know them, and doing it via pve wasn't the best solution, it's not that good a console for pasting I think, but using portainer was nice. I've backed up the file, and modified it, so now it's a question of waiting to see if it improves.

fribse commented 3 years ago

Hi @litinoveweedle It worked for 18 hours, and now it doesn't any longer :-(

bipsendk commented 3 years ago

Have same issue - connection to IKEA GW seems to be lost after a day or two. Reloading the integration component solves the issue in my installation.

litinoveweedle commented 3 years ago

Cool, to both of you: should I employ my crystal ball or you are gonna to post relevant HA logs? ;-p

bipsendk commented 3 years ago

Not that familiar with HassOS (running HA ina VM). I need to know how to enable debug logging and where to locate the file - that I can start loggin to see if anything shows up.. EDIT: Log file found in /config ... Then it is just a question if debug logging can be enabled for the IKEA integration ..

litinoveweedle commented 3 years ago

Not that familiar with HassOS (running HA ina VM). I need to know how to enable debug logging and where to locate the file - that I can start loggin to see if anything shows up..

For now at least standard log would do. Otherwise sorry, no log, no fun.

realjax commented 3 years ago

running HA ina VM

That may very well be your problem. I think there are some COAP communication problems wit a setup like that.

EricReiche commented 3 years ago

That may very well be your problem. I think there are some COAP communication problems wit a setup like that.

Possible, but I have the same issue on a raspi4 and the devices are in the same subnet.

bipsendk commented 3 years ago

running HA ina VM

That may very well be your problem. I think there are some COAP communication problems wit a setup like that.

Unfortunately the supervised setup is unsupported (as far as I could find out), and the NUC image cannot be installed on a ThinkCentre M92P, which by the way also is a nice litlle piece of hardware to run such things on...

litinoveweedle commented 3 years ago

running HA ina VM

That may very well be your problem. I think there are some COAP communication problems wit a setup like that.

No way man... At least this problem is pretty clear to me so far (missing credentials after protocol reset) Instead of lamenting would any of you try to apply my patch and sent a log if/when it fails? If not than you probably don't want to try to fix it at all.

realjax commented 3 years ago

Okay man. Keep your shirt on.

ggravlingen commented 3 years ago

Please use a civilized tone, we’re all using our free time trying to resolve an issue here.

On topic, running HA in a VM combined with Tradfri has been known to cause connection issues. There are a few issues here on GitHub around that.

bipsendk commented 3 years ago

I am not sure where to apply the patch from 42563 in HassOS - as I cannot find any pytradfri folder...

I might have to start looking for another way to run HomeAssistant (off a linux box), if this could solve things. Just need to find a guide..

JOTItv commented 3 years ago

Please use a civilized tone, we’re all using our free time trying to resolve an issue here.

On topic, running HA in a VM combined with Tradfri has been known to cause connection issues. There are a few issues here on GitHub around that.

I can confirm that just resetting the switch between a TRADFRI hub and the HASSIO server is enough to break the connection. The connection between both is stable as long as there is no disconnect in network connectivity, and it will not reestablish automatically. I just updated my switch firmware (unifi) and that immediately breaks the connectivity. Rebooting HASS afterwards is the only resolution and this has been since many versions of HASS. (I'm running HASS OS on a PI4 with the latest versions)

magicbenny commented 3 years ago

I am not sure where to apply the patch from 42563 in HassOS - as I cannot find any pytradfri folder...

I might have to start looking for another way to run HomeAssistant (off a linux box), if this could solve things. Just need to find a guide..

after this trader/HA problem appeared i decided to move away from IKEA Gateway. To solve this i installed HassOS on a Raspi4 with a Conbee II stick. Tradfri devices are Zigbee devices and can be used without the gateway. so this was the best solution for me.

the answer that the reason for the discons is that the HA Installation is on a VM is from my pov not true cause HA/Tradfri runs rock solid for months here but suddenly it stopped working reliable. how ever, my IKEA GW is now in a box in the basement and is collection dust, never will be used again. :)

fribse commented 3 years ago

my IKEA GW is now in a box in the basement and is collection dust, never will be used again. :)

Problem is that deconz does not handle IKEA Kadrilj or Fyrtur properly, so that 'solution' is not relevant.

ggravlingen commented 3 years ago

@fribse I’ve been giving this some thought this weekend. Have many devices do you have? Do you run anything that changes the state of the lights constantly? The same question is relevant for the others in this thread as well.

iamthew4lrus789 commented 3 years ago

@ggravlingen I have about a dozen lights (GU10 variety of it's relevant).

Nothing else controlling them, other than the official ikea app (which still works even once HA ceases to), plus the lightswitches that came with the bulbs.

No other tradfri or ZigBee devices.

ggravlingen commented 3 years ago

Thanks! The reason I’m asking is that HA starts a new connection to each device (it “observes” them. With many devices in the system, there will be many simultaneous connections to the hub.

I’ve been given it some though if we should stop having live updates of the system and start doing polling of the devices states instead, thereby reducing load.

fribse commented 3 years ago

Well, I only had 3 blinds and one light attached, so there is no direct correalation between number of devices and stabiliry.

ggravlingen commented 3 years ago

@fribse ok, thanks, then we can probably rule that out.

kjetilsn commented 3 years ago

@ggravlingen Maybe not, perhaps this is relevant in some cases. I've started seeing the same problem lately (dissconnect every ~12-24 hours with "Unable to execute command <Command put ['15001', 65552]: {'3311': [{'5850': 0}]}") And it did start occuring after adding more trådfri components (went from ~20 to ~30). No changes to the setup (docker on ubuntu machine). If it has not been broken by either a hass or ikea uptdate that is.