Closed Mariusthvdb closed 4 months ago
update: checking the logs again I now see the sequence of errors is finalized as follows:
(already known loggings)
2023-12-06 03:32:15.557 ERROR (MainThread) [homeassistant.components.tradfri] Keep-alive failed
2023-12-06 03:37:17.521 WARNING (MainThread) [py.warnings] /usr/local/lib/python3.11/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain
warnings.warn("DTLS module did not shut down the DTLSSocket "
(and I believe this is new to me)
2023-12-06 03:37:17.604 ERROR (MainThread) [coap] Connection loss was not expected.
not 100% sure that is the final thing in the original sequence, or that this is logged because of an automated reload of the integration.
fact is that many lights dont get their correct state back in HA (still perfect in the app), so the communication between the integration and the device after a connection loss seems to be imperfect.
To be sure there is nothing network related, I also checked the Unifi console on alerts or security breaches but nothing is logged for the Tradfri gateway not has it disconnected. Internet has been available 100% too.
Same issue here. The frontend controls return an error, but weirdly, some automations still work.
2023-12-13 15:10:57.481 INFO (MainThread) [homeassistant.components.script.check_occupied] Check Occupied: Choose at step 1: choice 1: Executing step call service
2023-12-13 15:10:57.568 ERROR (MainThread) [coap.blockwise-requester] Exception in BlockwiseRequest runner neither went to response nor to observation: invalid state
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 606, in _run_outer
await cls._run(app_request, response, weak_observation, protocol, log)
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 742, in _run
response.set_result(assembled_response)
asyncio.exceptions.InvalidStateError: invalid state
2023-12-13 15:10:57.852 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Running automation actions
2023-12-13 15:10:57.865 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Running automation actions
2023-12-13 15:10:57.866 INFO (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Executing step call service
2023-12-13 15:10:57.890 INFO (MainThread) [coap] No DTLS connection active to (10.x.x.x, 5684, b'8f4ff330c6ab458fbfede6ab2ac23289'), creating one
2023-12-13 15:10:57.904 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.912 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.913 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.914 WARNING (MainThread) [coap] An exception that is not an aiocoap Error was raised from a transport; please report this as a bug in aiocoap: TypeError("'NoneType' object does not support item assignment")
2023-12-13 15:10:57.949 ERROR (MainThread) [homeassistant.components.automation.lounge_lights_on] lounge_lights_on: Turn on the light per schedule: default: Error executing script. Unexpected error for call_service at pos 1: 'NoneType' object does not support item assignment
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 468, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 704, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 666, in _async_run_long_action
return long_task.result()
^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2067, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2104, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 272, in handle_service
return await service.entity_service_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 878, in entity_service_call
single_response = await _handle_entity_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
result = await task
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
await light.async_turn_on(**filter_turn_on_params(light, params))
File "/usr/src/homeassistant/homeassistant/components/group/light.py", line 174, in async_turn_on
await self.hass.services.async_call(
File "/usr/src/homeassistant/homeassistant/core.py", line 2067, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2104, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 272, in handle_service
return await service.entity_service_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 903, in entity_service_call
raise result from None
File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 1233, in async_request_call
return await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 948, in _handle_entity_call
result = await task
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 580, in async_handle_light_on_service
await light.async_turn_on(**filter_turn_on_params(light, params))
File "/usr/src/homeassistant/homeassistant/components/tradfri/light.py", line 227, in async_turn_on
await self._api(command)
File "/usr/src/homeassistant/homeassistant/components/tradfri/base_class.py", line 30, in wrapper
await func(command)
File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 199, in request
result = await self._execute(api_commands, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 167, in _execute
_, res = await self._get_response(msg, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 107, in _get_response
pr_resp = await asyncio.wait_for(pr_req.response, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
return await fut
^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 606, in _run_outer
await cls._run(app_request, response, weak_observation, protocol, log)
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 665, in _run
blockresponse = await blockrequest.response
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 349, in send
request_interface.request(pipe)
File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 252, in request
self.outgoing_requests[key] = request
~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object does not support item assignment
Then when trying a manual reload
2023-12-14 08:50:14.944 ERROR (MainThread) [homeassistant.config_entries] Error unloading entry 10.10.102.80 for tradfri
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/config_entries.py", line 557, in async_unload
result = await component.async_unload_entry(hass, self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/tradfri/__init__.py", line 149, in async_unload_entry
await factory.shutdown()
File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 99, in shutdown
await self._reset_protocol(exc)
File "/usr/local/lib/python3.11/site-packages/pytradfri/api/aiocoap_api.py", line 87, in _reset_protocol
await protocol.shutdown()
File "/usr/local/lib/python3.11/site-packages/aiocoap/protocol.py", line 326, in shutdown
await item
File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 61, in shutdown
await self.token_interface.shutdown()
File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 65, in shutdown
for messageerror_monitor, cancellable in self._active_exchanges.values():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'values'
2023-12-14 08:50:15.937 ERROR (SyncWorker_6) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/aiocoap/tokenmanager.py", line 61, in shutdown
await self.token_interface.shutdown()
File "/usr/local/lib/python3.11/site-packages/aiocoap/messagemanager.py", line 65, in shutdown
for messageerror_monitor, cancellable in self._active_exchanges.values():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'values'
along with a message that request a full restart.
This resolves the issue, for a period of time, but it seems to be failing daily.
failing more tha ever, and more vehemently than ever. My auto reload automation has been working since January, and didnt cause an issue.
currently I believe 1 out if 3, it fails to reload and the aforementioned error pops up..
so this has definitely gotten worse, up to the breaking level I would say.
Too bad there's no response from the team, as if we're writing in the void. Especially since this has been going on for almost a year now...
@Mariusthvdb, I belatedly RdTFM re. Add/Remove after firmware update. Will keep you posted if it has addressed the issue my end.
please do because the frequency of failures has gone up, and the result has become more fatal: only a Home assistant restart can fix it (where before the integration reload sufficed)
what exactly do you mean with
Add/Remove after firmware update
?
what exactly do you mean with
Add/Remove after firmware update
(https://www.home-assistant.io/integrations/tradfri#firmware-updates)
I then readded manually, and everything mostly came back. (a couple of name overrides were missing)
Heck I never saw that paragraph.... Though the error is different an unrelated probably, I am ready to try anything... Might be not too bad to install cleanly for once
Still having issues, so that was not it. Done a little investigation, and I think it is a coap overrun issue when sending request too quickly. Perhaps the integration Codeowner could have a look?
On my production HassOS, I have a light on/off automation triggered on motion (Restart Mode). The logs show multiple 'INFO (MainThread) [coap] No DTLS connection active to (10.10.102.80, 5684, b'832523c59a764d80b7b59faec3c03616'), creating one'
There are occasional ACK errors with logs
INFO (MainThread) [coap] Response <aiocoap.Message at 0x4e65bd30: ACK 2.04 Changed (MID 47904, token 194e) remote INFO (MainThread) [coap] Ignoring unknown response (which is not a unicast CON)
Finally, There are a series of
INFO (MainThread) [coap] Retransmission, Message ID: 48071.
followed by a series of python errors indicating coap had shutdown/crashed.
I setup a a test automation to run on my dev platform '
Change the time_pattern to 1 second and it quickly kills Core.
For a workaround, I'm going to rework my automation scripts and will report back if this helps.
Still having issues, so that was not it.
Too bad, but good to know, I was about to repeat that procedure, glad I waited ;-)
Done a little investigation, and I think it is a coap overrun issue when sending request too quickly.
Yes something like that seems to occur, though I believe it is out of our hands as enduser, even though it seems to occur when scripts/automations are run operating Tradfri devices, I also get the randomly in the night. Ive traced those times, to not have anything to do with user made processes
Perhaps the integration Codeowner could have a look?
That's just the thing, the last code-owner pulled out some time ago, and no one is assigned , which ofc is a major bummer for an itegration of this importance. Only Martin checks every now and then, but does not act on this one either unfortunately.
For a workaround, I'm going to rework my automation scripts and will report back if this helps.
I did start with taking out any transition, thinking that would be too taxing for the gateway. It didnt help. As a matter of fact, have a Christmas script testing that, and it keeps repeating fade-in-fadeout on 2 bulbs, and it goes on for hours without an issue..... so that is not it for sure.
Id be glad to assist what you do, or comment with my experience of the last 11 months.....
I only noticed the issues with 2023.12. I've downgraded to 2023.11.3 to see if goes away.
I did note there has been a known issue (https://github.com/home-assistant-libs/pytradfri/tree/master#known-issues) with overrun. @bdraco has added some event based speed optimizations this month, which could potential why I starting to see it.
that known issues paragraph in fact says, users are on their own.
how are users supposed to:
As a first solution, try to limit the number of requests, and move the Gateway closer to the device running pytradfri on the nework
If the gateway is lan cabled to the HA system, and request, well, are out of control other than turning on/off a light now and then...
I wonder how they can state this project can do nothing, if it is not seriously investigated, and 'anecdotal' evidence proves the issue in increasing. only so by either updating HA, or the Tadfri gateway.
its been since januari this is getting closed and closed again https://github.com/home-assistant-libs/pytradfri/issues/617#issuecomment-1399367301 without as much a hint of help
Latest: 2023.11.3 initially seems to fix the reconnect issues. That being said, I still had an issue with disconnects. I checked my UniFi connection logs and the hub had disconnected. I rebooted the hub and the integration immediately came back online.
Interestingly, The Unifi log history shows that the hub is loosing the connection consistently every 30 minutes. I've replaced the network cable, and will look at a different power supply, but IMO, it looks like the hub is going into sleep mode. Its possible this was part of the latest firmware 1.21.31 (4-Dec-23) which I only installed a week or so ago.
hmm, not related I think.
no issues at al per 30 minutes.
frequency seems to be as before. though I dont recall seeing them repeated to fast
hmm, not related I think.
Maybe right.
PSU did not make a difference. I can keep the connection up using 'wake_on_lan:' (must be in configuration.yaml) and the following script
alias: _ikea_keep_alive
description: ""
trigger:
- platform: time_pattern
minutes: /15
condition: []
action:
- service: wake_on_lan.send_magic_packet
data:
mac: <your gateway mac address>
broadcast_port: 7
broadcast_address: <your gateway ip>
mode: single
@Mariusthvdb, Would be interesting if it has any affect on your reload count frequency.
in the past, I have already tried things like that, but never believed it would change things, as there is no issue of timing out or sleep or what ever you might call it.
Ive also Pinged the gateway on every 10 minutes or so, in various frequencies. never did I notice an improvement.
Only thing I can say is when it started (2023.1) and ever since, all we users could try failed.
for how long have you tested this wake automation now?
using a Ikea gateway 1st generation and also a Dirigera . the latter only on experimental base, as it isnt supported other than via HomeKit controller
at the time these loss of connection issues started (2023.1) the Dirigera wasnt there yet, and I dont believe it has to do with Ikea purposely doing that, as the App still works as expected, and even the Home app on my iPhone has no issues. Its really a HA thing (or any of its underlying libraries, like COAP)
2023.12 started with the unloading trouble. Though, I havent seen that for a few connection losses now, so it doesnt happen all the time (before it never did, and we could always restart the integration)
the frequency did go up and sometimes even very quickly too, as if the reload doesnt fully reconnect, or starts a new issue I itself.....
Hope you find something, though in all honesty, Ive almost lost faith HA will ever bring this back to regular operation. Which is too bad really, as Ikea still develops its firmware.
Home Assistant should really get a code owner on this integration to recheck its workings and make it up to date again.
you'd better not ping like that, it is considered rude.. for polls we have the community, or eg Discord
that crash log has no new info does it? its the same as I posted in the opening post?
to not confuse or overload, please only post new info if you have it, otherwise we keep reding the same logs over and over and that is hardly useful
also, if you post a log, please block quote those as code, it makes reading them much easier
I don't know if it's related, but my old IKEA hub (not Dirigera) has begun to go off-line, not responding to "ok google" or pings.
anything in the logs? can you see the Gateway in your router? please provide more info, or open a separate topic if not related to keep things tidy as possible here?
so, Integration failing more and more daily now. While at the same time, eg Google Assistant doesnt have an issue at all, or, ofc, the Ikea Tradfri app itself.
thing ism tha any script or automation using an Integration entity is suffering, and we can do but s much to guard all those entities.
just look at this...
its horror currently, and no one is bothered?
ok, so HA 2024.2 beta seems to report differerent loggings, which I hope can help pinpoint this a bit better.
Upon each restart we now see the Aiocoap message (this didnt happen before, as it was accidental)
Logger: py.warnings
Source: /usr/local/lib/python3.12/warnings.py:109
First occurred: 09:22:08 (1 occurrences)
Last logged: 09:22:08
/usr/local/lib/python3.12/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain warnings.warn("DTLS module did not shut down the DTLSSocket "
and next:
Logger: coap
Source: runner.py:188
First occurred: 09:22:08 (1 occurrences)
Last logged: 09:22:08
Connection loss was not expected.
I must mention that here, because it then leads to our automation trying to read the integration, as is the subject of this issue.
Logger: homeassistant.components.tradfri
Source: components/system_log/__init__.py:306
Integration: IKEA TRÅDFRI (documentation, issues)
First occurred: 09:22:09 (1 occurrences)
Last logged: 09:22:09
Ikea Trådfri: 'failed' on coap issue logged: Connection loss was not expected.. Reloading integration
and this is the latest, which I hope has more info for the devs to check what is happening:
Logger: homeassistant.components.script.reload_tradfri_integration
Source: helpers/script.py:1805
Integration: Script (documentation, issues)
First occurred: 09:22:09 (1 occurrences)
Last logged: 09:22:09
reload_tradfri_integration: Error executing script. Error for call_service at pos 1: The config entry Ikea Trådfri (tradfri) with entry_id 01c28e550efaa39e54aaf8180d8c10d8 cannot be unloaded because it is not in a recoverable state (ConfigEntryState.SETUP_IN_PROGRESS)
and finally this again:
Logger: py.warnings
Source: components/unifi/controller.py:384
First occurred: 09:23:47 (1 occurrences)
Last logged: 09:23:47
/usr/local/lib/python3.12/site-packages/aiocoap/transports/tinydtls.py:166: UserWarning: DTLS module did not shut down the DTLSSocket perfectly; it still tried to call _write in vain warnings.warn("DTLS module did not shut down the DTLSSocket "
so even though I try to keeps things separate, it must be seen in connection to the other issue I am afraid.
I've been following this thread since the beginning but remained silent because I had nothing substantial to add. Tradfri started failing dozens of time per day in my case (is it because my automations set the brightness values of the many lights of a room individually ? I use a multiplying factor to the current brightness of each individual light when I change a room's brightness). and requiring a complete restart of home assistant each time made my life hell the whole time.
Yesterday evening I upgraded to 2024.2.0. ( I am running HA on a macos 13.6.4, not in a container ) After restart, HA started saying that python 3.11.4 would not be supported anymore starting in april, suggesting to upgrade to python 3.12. So I replaced my python.org's 3.11.4 distribution with a python 3.12 from homebrew. I updated the necessary symbolic links. Then re-created my homeassistant's venv with the new python. And eventually I did a source homeassistant/bin/activate && pip install --upgrade --force-reinstall homeassistant to restart fresh as much as possible.
After reboot and download of all dependencies, tradfri failed to load saying that it wasn't able to resolve ['pytradfri[async]==9.0.1'] Apparently something is not working with the way the line "requirements": ["pytradfri[async]==9.0.1"] in tradfri/manifest.json is handled. So I did a pip install 'pytradfri[async]==9.0.1' manually and restarted HA.
And eventually the tradfri integration started successfully.
Since then,
I don't know if it's because of the new release or the new python of both, sadly, but selfishly i think it's good news for now. It's far from perfect but it's at least usable now.
good to hear there is some progress there... yet as you say, still far from perfect.
I solved it by moving all devices to Dirigera via HomeKit...... https://github.com/home-assistant/core/pull/109880 add the purifiers to the HomeKit and with that, I have been able to take out the original Ikea Gateway and now have a perfectly stable setup.
And, relevant here, no more reloading of any integration. it just works.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
The problem
follow-up of https://github.com/home-assistant/core/issues/85254#issuecomment-1838614615
which was an attempt to keep some oversight in the Ikea Integration issue of losing connection to the Gateway. that hasn't changed. New is the fact that even though an automated integration reload seems to initially work alright (bringing back the entities in the dashboard), clicking any of those entities, or, controlling them via scripts etc leads to None type error.
When we try to manually reload the integration once more, there is an error leading to
and below error sequence is in the log
Requiring a HA restart, which makes the issue of HA losing connection even worse than before. Having to reload an integration has been a ok-ish working hack to keep things alive. Having to restart it seriously getting to be critical.
I did see this happening before the Ikea firmware update to dec 4th.
What version of Home Assistant Core has the issue?
2023.12.0b1
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Tradfri
Link to integration documentation on our website
https://www.home-assistant.io/integrations/tradfri/
Diagnostics information
config_entry-tradfri-01c28e550efaa39e54aaf8180d8c10d8.json-2.txt
Example YAML snippet
automated reload that worked up to know.
Anything in the logs that might be useful for us?
a subsequent manual reload of the integration now also errors with:
followed up with:
Additional information
No response