home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.15k stars 30.19k forks source link

Tradfri: manual reload of the integration fails #105004

Closed Mariusthvdb closed 4 months ago

Mariusthvdb commented 10 months ago

The problem

follow-up of https://github.com/home-assistant/core/issues/85254#issuecomment-1838614615

which was an attempt to keep some oversight in the Ikea Integration issue of losing connection to the Gateway. that hasn't changed. New is the fact that even though an automated integration reload seems to initially work alright (bringing back the entities in the dashboard), clicking any of those entities, or, controlling them via scripts etc leads to None type error.

When we try to manually reload the integration once more, there is an error leading to

Scherm­afbeelding 2023-12-04 om 11 34 50

and below error sequence is in the log IMG_3708

Requiring a HA restart, which makes the issue of HA losing connection even worse than before. Having to reload an integration has been a ok-ish working hack to keep things alive. Having to restart it seriously getting to be critical.

I did see this happening before the Ikea firmware update to dec 4th.

What version of Home Assistant Core has the issue?

2023.12.0b1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Tradfri

Link to integration documentation on our website

https://www.home-assistant.io/integrations/tradfri/

Diagnostics information

config_entry-tradfri-01c28e550efaa39e54aaf8180d8c10d8.json-2.txt

Example YAML snippet

automated reload that worked up to know.

script:

  reload_tradfri_integration:
    mode: restart
    sequence:
      service: homeassistant.reload_config_entry
      data:
        entry_id: 01c28e550efaa39e54aaf8180d8c10d8

automation:

  - id: reload_tradfri_integration
    mode: single
    trigger:
      - platform: event
        event_type: system_log_event
        event_data:
          level: WARNING
        id: observation
      - platform: event
        event_type: system_log_event
        event_data:
          level: ERROR
        id: failed
    condition:
      - >
        {{now() - this.attributes.last_triggered > timedelta(minutes=5) if
          this.attributes.last_triggered is not none else true}}
#       or:
#         - >
#           {{trigger.event.data.name == 'tradfri.base_class' and 'Observation failed for'
#             in trigger.event.data.message[0]}}
      - >
        {{trigger.event.data.name == 'homeassistant.components.tradfri' and
          'Keep-alive failed' in trigger.event.data.message[0]}}
#         - >
#           {{trigger.event.data.name == 'coap' and
#               'Error received in UDP connection under DTLS' in trigger.event.data.message[0]}}
    action:
      - service: counter.increment
        target:
          entity_id: counter.reload_tradfri_integration
      - service: system_log.write
        data:
          message: >
            Ikea Trådfri: '{{trigger.id}}' on {{trigger.event.data.name}} issue logged:
            {{trigger.event.data.message[0]}}.
            Reloading integration
          level: critical
          logger: homeassistant.components.tradfri
      - service: script.reload_tradfri_integration
      - service: system_log.write
        data:
          message: >
            Ikea Trådfri integration was reloaded because of {{trigger.id}}
            Full data:
            {{trigger.event}}
          level: critical
          logger: homeassistant.components.tradfri

Anything in the logs that might be useful for us?

Logger: root
Source: /usr/src/homeassistant/homeassistant/bootstrap.py:402
First occurred: 10:21:50 (76 occurrences)
Last logged: 11:30:16

Uncaught exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiocoap/transports/tinydtls.py", line 170, in wrapper
    return getattr(self, __method)(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiocoap/transports/tinydtls.py", line 280, in _read
    self.coaptransport.ctx.dispatch_message(message)
  File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 109, in dispatch_message
    success = self._process_response(message)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 335, in _process_response
    return self.token_manager.process_response(response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 167, in process_response
    if key not in self.outgoing_requests:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

a subsequent manual reload of the integration now also errors with:

Logger: homeassistant.config_entries
Source: config_entries.py:557
First occurred: 11:30:46 (1 occurrences)
Last logged: 11:30:46

Error unloading entry Ikea Trådfri for tradfri
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 557, in async_unload
    result = await component.async_unload_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/tradfri/__init__.py", line 149, in async_unload_entry
    await factory.shutdown()
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 99, in shutdown
    await self._reset_protocol(exc)
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 87, in _reset_protocol
    await protocol.shutdown()
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 326, in shutdown
    await item
  File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 61, in shutdown
    await self.token_interface.shutdown()
  File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 65, in shutdown
    for messageerror_monitor, cancellable in self._active_exchanges.values():
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'values'

followed up with:

2023-12-04 20:38:13.545 WARNING (MainThread) [py.warnings] /usr/local/lib/python3.11/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain
  warnings.warn("DTLS module did not shut down the DTLSSocket "

Additional information

No response

home-assistant[bot] commented 10 months ago

tradfri documentation tradfri source

Mariusthvdb commented 10 months ago

update: checking the logs again I now see the sequence of errors is finalized as follows:

(already known loggings)

2023-12-06 03:32:15.557 ERROR (MainThread) [homeassistant.components.tradfri] Keep-alive failed

2023-12-06 03:37:17.521 WARNING (MainThread) [py.warnings] /usr/local/lib/python3.11/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain
  warnings.warn("DTLS module did not shut down the DTLSSocket "

(and I believe this is new to me)

2023-12-06 03:37:17.604 ERROR (MainThread) [coap] Connection loss was not expected.

not 100% sure that is the final thing in the original sequence, or that this is logged because of an automated reload of the integration.

fact is that many lights dont get their correct state back in HA (still perfect in the app), so the communication between the integration and the device after a connection loss seems to be imperfect.

To be sure there is nothing network related, I also checked the Unifi console on alerts or security breaches but nothing is logged for the Tradfri gateway not has it disconnected. Internet has been available 100% too.

wizmo2 commented 9 months ago

Same issue here. The frontend controls return an error, but weirdly, some automations still work.

2023-12-13 15:10:57.481 INFO (MainThread) [homeassistant.components.script.check_occupied] Check Occupied: Choose at step 1: choice 1: Executing step call service
2023-12-13 15:10:57.568 ERROR (MainThread) [coap.blockwise-requester] Exception in BlockwiseRequest runner neither went to response nor to observation: invalid state
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 606, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 742, in _run
    response.set_result(assembled_response)
asyncio.exceptions.InvalidStateError: invalid state
2023-12-13 15:10:57.852 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Running automation actions
2023-12-13 15:10:57.865 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Running automation actions
2023-12-13 15:10:57.866 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Executing step call service
2023-12-13 15:10:57.890 INFO (MainThread) [coap] No DTLS connection active to (10.x.x.x, 5684, b'8f4ff330c6ab458fbfede6ab2ac23289'), creating one
2023-12-13 15:10:57.904 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.912 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.913 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.914 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.949 ERROR (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Error executing script. Unexpected error for call_service at pos 1: 'NoneType' object does not support item assignment
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 468, in _async_step
    await getattr(self, handler)()
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 704, in _async_call_service_step
    response_data = await self._async_run_long_action(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 666, in _async_run_long_action
    return long_task.result()
           ^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2067, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2104, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 272, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 878, in entity_service_call
    single_response = await _handle_entity_call(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
    await light.async_turn_on(**filter_turn_on_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/group/light.py", line 174, in async_turn_on
    await self.hass.services.async_call(
  File "/usr/src/homeassistant/homeassistant/core.py", line 2067, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2104, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 272, in handle_service
    return await service.entity_service_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 903, in entity_service_call
    raise result from None
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 1233, in async_request_call
    return await coro
           ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
    await light.async_turn_on(**filter_turn_on_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/tradfri/light.py", line 227, in async_turn_on
    await self._api(command)
  File "/usr/src/homeassistant/homeassistant/components/tradfri/base_class.py", line 30, in wrapper
    await func(command)
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 199, in request
    result = await self._execute(api_commands, timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 167, in _execute
    _, res = await self._get_response(msg, timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 107, in _get_response
    pr_resp = await asyncio.wait_for(pr_req.response, timeout)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 606, in _run_outer
    await cls._run(app_request, response, weak_observation, protocol, log)
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 665, in _run
    blockresponse = await blockrequest.response
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 349, in send
    request_interface.request(pipe)
  File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 252, in request
    self.outgoing_requests[key] = request
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object does not support item assignment

Then when trying a manual reload

2023-12-14 08:50:14.944 ERROR (MainThread) [homeassistant.config_entries] Error unloading entry 10.10.102.80 for tradfri
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 557, in async_unload
    result = await component.async_unload_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/tradfri/__init__.py", line 149, in async_unload_entry
    await factory.shutdown()
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 99, in shutdown
    await self._reset_protocol(exc)
  File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 87, in _reset_protocol
    await protocol.shutdown()
  File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 326, in shutdown
    await item
  File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 61, in shutdown
    await self.token_interface.shutdown()
  File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 65, in shutdown
    for messageerror_monitor, cancellable in self._active_exchanges.values():
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'values'
2023-12-14 08:50:15.937 ERROR (SyncWorker_6) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 61, in shutdown
    await self.token_interface.shutdown()
  File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 65, in shutdown
    for messageerror_monitor, cancellable in self._active_exchanges.values():
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'values'

along with a message that request a full restart.

This resolves the issue, for a period of time, but it seems to be failing daily.

Mariusthvdb commented 9 months ago

failing more tha ever, and more vehemently than ever. My auto reload automation has been working since January, and didnt cause an issue.

currently I believe 1 out if 3, it fails to reload and the aforementioned error pops up..

so this has definitely gotten worse, up to the breaking level I would say.

Too bad there's no response from the team, as if we're writing in the void. Especially since this has been going on for almost a year now...

wizmo2 commented 9 months ago

@Mariusthvdb, I belatedly RdTFM re. Add/Remove after firmware update. Will keep you posted if it has addressed the issue my end.

Mariusthvdb commented 9 months ago

please do because the frequency of failures has gone up, and the result has become more fatal: only a Home assistant restart can fix it (where before the integration reload sufficed)

what exactly do you mean with

Add/Remove after firmware update

?

wizmo2 commented 9 months ago

what exactly do you mean with

Add/Remove after firmware update

(https://www.home-assistant.io/integrations/tradfri#firmware-updates)

I then readded manually, and everything mostly came back. (a couple of name overrides were missing)

Mariusthvdb commented 9 months ago

Heck I never saw that paragraph.... Though the error is different an unrelated probably, I am ready to try anything... Might be not too bad to install cleanly for once

wizmo2 commented 9 months ago

Still having issues, so that was not it. Done a little investigation, and I think it is a coap overrun issue when sending request too quickly. Perhaps the integration Codeowner could have a look?

On my production HassOS, I have a light on/off automation triggered on motion (Restart Mode). The logs show multiple 'INFO (MainThread) [coap] No DTLS connection active to (10.10.102.80, 5684, b'832523c59a764d80b7b59faec3c03616'), creating one'

There are occasional ACK errors with logs INFO (MainThread) [coap] Response <aiocoap.Message at 0x4e65bd30: ACK 2.04 Changed (MID 47904, token 194e) remote INFO (MainThread) [coap] Ignoring unknown response (which is not a unicast CON)

Finally, There are a series of INFO (MainThread) [coap] Retransmission, Message ID: 48071.

followed by a series of python errors indicating coap had shutdown/crashed.

I setup a a test automation to run on my dev platform '

Change the time_pattern to 1 second and it quickly kills Core.

For a workaround, I'm going to rework my automation scripts and will report back if this helps.

Mariusthvdb commented 9 months ago

Still having issues, so that was not it.

Too bad, but good to know, I was about to repeat that procedure, glad I waited ;-)

Done a little investigation, and I think it is a coap overrun issue when sending request too quickly.

Yes something like that seems to occur, though I believe it is out of our hands as enduser, even though it seems to occur when scripts/automations are run operating Tradfri devices, I also get the randomly in the night. Ive traced those times, to not have anything to do with user made processes

Perhaps the integration Codeowner could have a look?

That's just the thing, the last code-owner pulled out some time ago, and no one is assigned , which ofc is a major bummer for an itegration of this importance. Only Martin checks every now and then, but does not act on this one either unfortunately.

For a workaround, I'm going to rework my automation scripts and will report back if this helps.

I did start with taking out any transition, thinking that would be too taxing for the gateway. It didnt help. As a matter of fact, have a Christmas script testing that, and it keeps repeating fade-in-fadeout on 2 bulbs, and it goes on for hours without an issue..... so that is not it for sure.

Id be glad to assist what you do, or comment with my experience of the last 11 months.....

wizmo2 commented 9 months ago

I only noticed the issues with 2023.12. I've downgraded to 2023.11.3 to see if goes away.

I did note there has been a known issue (https://github.com/home-assistant-libs/pytradfri/tree/master#known-issues) with overrun. @bdraco has added some event based speed optimizations this month, which could potential why I starting to see it.

Mariusthvdb commented 9 months ago

that known issues paragraph in fact says, users are on their own.

how are users supposed to:

As a first solution, try to limit the number of requests, and move the Gateway closer to the device running pytradfri on the nework

If the gateway is lan cabled to the HA system, and request, well, are out of control other than turning on/off a light now and then...

I wonder how they can state this project can do nothing, if it is not seriously investigated, and 'anecdotal' evidence proves the issue in increasing. only so by either updating HA, or the Tadfri gateway.

its been since januari this is getting closed and closed again https://github.com/home-assistant-libs/pytradfri/issues/617#issuecomment-1399367301 without as much a hint of help

wizmo2 commented 9 months ago

Latest: 2023.11.3 initially seems to fix the reconnect issues. That being said, I still had an issue with disconnects. I checked my UniFi connection logs and the hub had disconnected. I rebooted the hub and the integration immediately came back online.

Interestingly, The Unifi log history shows that the hub is loosing the connection consistently every 30 minutes. I've replaced the network cable, and will look at a different power supply, but IMO, it looks like the hub is going into sleep mode. Its possible this was part of the latest firmware 1.21.31 (4-Dec-23) which I only installed a week or so ago.

Mariusthvdb commented 9 months ago

hmm, not related I think.

no issues at al per 30 minutes.

Scherm­afbeelding 2023-12-18 om 13 55 04

frequency seems to be as before. though I dont recall seeing them repeated to fast

wizmo2 commented 9 months ago

hmm, not related I think.

Maybe right.

PSU did not make a difference. I can keep the connection up using 'wake_on_lan:' (must be in configuration.yaml) and the following script

alias: _ikea_keep_alive
description: ""
trigger:
  - platform: time_pattern
    minutes: /15
condition: []
action:
  - service: wake_on_lan.send_magic_packet
    data:
      mac: <your gateway mac address>
      broadcast_port: 7
      broadcast_address: <your gateway ip>
mode: single

@Mariusthvdb, Would be interesting if it has any affect on your reload count frequency.

Mariusthvdb commented 9 months ago

in the past, I have already tried things like that, but never believed it would change things, as there is no issue of timing out or sleep or what ever you might call it.

Ive also Pinged the gateway on every 10 minutes or so, in various frequencies. never did I notice an improvement.

Only thing I can say is when it started (2023.1) and ever since, all we users could try failed.

for how long have you tested this wake automation now?

Mariusthvdb commented 9 months ago

using a Ikea gateway 1st generation and also a Dirigera . the latter only on experimental base, as it isnt supported other than via HomeKit controller

at the time these loss of connection issues started (2023.1) the Dirigera wasnt there yet, and I dont believe it has to do with Ikea purposely doing that, as the App still works as expected, and even the Home app on my iPhone has no issues. Its really a HA thing (or any of its underlying libraries, like COAP)

2023.12 started with the unloading trouble. Though, I havent seen that for a few connection losses now, so it doesnt happen all the time (before it never did, and we could always restart the integration)

the frequency did go up and sometimes even very quickly too, as if the reload doesnt fully reconnect, or starts a new issue I itself.....

Hope you find something, though in all honesty, Ive almost lost faith HA will ever bring this back to regular operation. Which is too bad really, as Ikea still develops its firmware.

Home Assistant should really get a code owner on this integration to recheck its workings and make it up to date again.

Mariusthvdb commented 9 months ago

you'd better not ping like that, it is considered rude.. for polls we have the community, or eg Discord

Mariusthvdb commented 9 months ago

that crash log has no new info does it? its the same as I posted in the opening post?

to not confuse or overload, please only post new info if you have it, otherwise we keep reding the same logs over and over and that is hardly useful

also, if you post a log, please block quote those as code, it makes reading them much easier

leifnel commented 9 months ago

I don't know if it's related, but my old IKEA hub (not Dirigera) has begun to go off-line, not responding to "ok google" or pings.

Mariusthvdb commented 9 months ago

anything in the logs? can you see the Gateway in your router? please provide more info, or open a separate topic if not related to keep things tidy as possible here?

Mariusthvdb commented 8 months ago

so, Integration failing more and more daily now. While at the same time, eg Google Assistant doesnt have an issue at all, or, ofc, the Ikea Tradfri app itself.

thing ism tha any script or automation using an Integration entity is suffering, and we can do but s much to guard all those entities.

just look at this...

Scherm­afbeelding 2024-01-31 om 10 41 50

its horror currently, and no one is bothered?

Mariusthvdb commented 8 months ago

ok, so HA 2024.2 beta seems to report differerent loggings, which I hope can help pinpoint this a bit better.

Upon each restart we now see the Aiocoap message (this didnt happen before, as it was accidental)

Logger: py.warnings
Source: /usr/local/lib/python3.12/warnings.py:109
First occurred: 09:22:08 (1 occurrences)
Last logged: 09:22:08

/usr/local/lib/python3.12/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain warnings.warn("DTLS module did not shut down the DTLSSocket "

and next:

Logger: coap
Source: runner.py:188
First occurred: 09:22:08 (1 occurrences)
Last logged: 09:22:08

Connection loss was not expected.

I must mention that here, because it then leads to our automation trying to read the integration, as is the subject of this issue.

Logger: homeassistant.components.tradfri
Source: components/system_log/__init__.py:306
Integration: IKEA TRÅDFRI (documentation, issues)
First occurred: 09:22:09 (1 occurrences)
Last logged: 09:22:09

Ikea Trådfri: 'failed' on coap issue logged: Connection loss was not expected.. Reloading integration

and this is the latest, which I hope has more info for the devs to check what is happening:

Logger: homeassistant.components.script.reload_tradfri_integration
Source: helpers/script.py:1805
Integration: Script (documentation, issues)
First occurred: 09:22:09 (1 occurrences)
Last logged: 09:22:09

reload_tradfri_integration: Error executing script. Error for call_service at pos 1: The config entry Ikea Trådfri (tradfri) with entry_id 01c28e550efaa39e54aaf8180d8c10d8 cannot be unloaded because it is not in a recoverable state (ConfigEntryState.SETUP_IN_PROGRESS)

and finally this again:

Logger: py.warnings
Source: components/unifi/controller.py:384
First occurred: 09:23:47 (1 occurrences)
Last logged: 09:23:47

/usr/local/lib/python3.12/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain warnings.warn("DTLS module did not shut down the DTLSSocket "

so even though I try to keeps things separate, it must be seen in connection to the other issue I am afraid.

RegorFromMontmartre commented 8 months ago

I've been following this thread since the beginning but remained silent because I had nothing substantial to add. Tradfri started failing dozens of time per day in my case (is it because my automations set the brightness values of the many lights of a room individually ? I use a multiplying factor to the current brightness of each individual light when I change a room's brightness). and requiring a complete restart of home assistant each time made my life hell the whole time.

Yesterday evening I upgraded to 2024.2.0. ( I am running HA on a macos 13.6.4, not in a container ) After restart, HA started saying that python 3.11.4 would not be supported anymore starting in april, suggesting to upgrade to python 3.12. So I replaced my python.org's 3.11.4 distribution with a python 3.12 from homebrew. I updated the necessary symbolic links. Then re-created my homeassistant's venv with the new python. And eventually I did a source homeassistant/bin/activate && pip install --upgrade --force-reinstall homeassistant to restart fresh as much as possible.

After reboot and download of all dependencies, tradfri failed to load saying that it wasn't able to resolve ['pytradfri[async]==9.0.1'] Apparently something is not working with the way the line "requirements": ["pytradfri[async]==9.0.1"] in tradfri/manifest.json is handled. So I did a pip install 'pytradfri[async]==9.0.1' manually and restarted HA.

And eventually the tradfri integration started successfully.

Since then,

I don't know if it's because of the new release or the new python of both, sadly, but selfishly i think it's good news for now. It's far from perfect but it's at least usable now.

Mariusthvdb commented 8 months ago

good to hear there is some progress there... yet as you say, still far from perfect.

I solved it by moving all devices to Dirigera via HomeKit...... https://github.com/home-assistant/core/pull/109880 add the purifiers to the HomeKit and with that, I have been able to take out the original Ikea Gateway and now have a perfectly stable setup.

And, relevant here, no more reloading of any integration. it just works.

issue-triage-workflows[bot] commented 4 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.