Closed Lurker00 closed 5 months ago
It should happens for every entity added still I'm not sure if that is the reason?
I don't understand how can "LocalTuya" causes this there is no waiting on initialization it only wait for 1 device to connect or fails then continue. Is there a notification message at the bottom left "waiting for "LocalTuya" to startup" ? maybe full HA logs may help.
I don't understand how can "LocalTuya" causes this
Neither do I. That why I didn't use "Bug" prefix in the title. But you can see in the log that LocalTuya connects all the devices over and over again, with all successful results! There must be something that triggers this process.
Is there a notification message at the bottom left "waiting for "LocalTuya" to startup" ?
No. It showed the message that HA started. Moreover, I have a an automation that sends a message to Telegram on HA started event, I've received it.
maybe full HA logs may help.
Should I restart HA with "default: debug" for the logger, or what? That log was recorded with default warning level. Writing full huge logs may affect the system by itself, so if you have a suggestion what is the minimum set of "logger" options you need, you're welcome!
Yeah enable debug level for logs and then post the full logs of HA, I find it difficult to find out what might cause the infinity loop in localtuya.
Here it is: home-assistant.log.zip
First I noticed Bluetooth related messages right before each LocalTuya "restart" and disabled Bluetooth intergation, to no avail. Then I tried to reconfigure a device (it helped to stop the loop before), but, alas, it continued.
Currently, I restarted HA without debug logging, and can't stop this process: LocalTuya restarts connecting to devices every 30 seconds, making the devices at this time unavailable.
Probably, it would not be a big problem if the devices were not become unavailable, keeping their statuses. But this way no an automation works.
I've managed to make HA working after a restart. Before I've uninstalled Bluetooth support in the armbian, and deleted all Bluetooth related intergations and devices. But I can't be sure about what caused this misbehavior the first time: I've also removed some other stuff, which didn't make a difference when I did it. Currently I'm afraid to continue investigation. It is running for 10 minutes without a glitch already, and I prefer to keep it running for a while.
Please note, that this sequence
2024-05-12 11:15:57.088 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Trying to connect to КЛ: Zigbee Bulb...
2024-05-12 11:15:57.101 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf2...ath] Trying to connect to ГСЛ: Zigbee Bulb...
2024-05-12 11:15:57.104 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfe...7yz] Trying to connect to ГС: Motion Sensor...
2024-05-12 11:15:57.109 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfa...hoi] Trying to connect to К: 3x Переключатель...
2024-05-12 11:15:57.113 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf0...tdq] Trying to connect to ГС: Выключатель...
2024-05-12 11:15:57.118 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf1...kn4] Trying to connect to ГС: 3x Переключатель...
2024-05-12 11:15:57.121 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf1...xw5] Trying to connect to К: Выключатель...
2024-05-12 11:15:57.127 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf1...z1q] Trying to connect to ГС: HP&L Сенсор...
2024-05-12 11:15:57.132 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf0...hwo] Trying to connect to К: M&L Сенсор...
2024-05-12 11:15:57.137 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf8...mlc] Trying to connect to К: Роутер...
2024-05-12 11:15:57.141 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfe...tf0] Trying to connect to ГС: Роутер...
2024-05-12 11:15:57.146 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf6...stj] Trying to connect to Тест: Smart Plug+...
2024-05-12 11:15:57.151 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfb...cbv] Trying to connect to К: Реле 4х...
2024-05-12 11:15:57.155 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfa...dbm] Trying to connect to ГСЛ: Реле 4х...
2024-05-12 11:15:57.161 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf2...aky] Trying to connect to ГС: 2х Переключатель...
2024-05-12 11:15:57.165 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bff...ns9] Trying to connect to ПЛ: Диммер...
2024-05-12 11:15:57.170 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bf0...epk] Trying to connect to П: Выключатель...
2024-05-12 11:15:57.174 DEBUG (MainThread) [custom_components.localtuya.coordinator] [bfc...b03] Trying to connect to П: Motion Sensor...
which continues over and over again, consists of devices of one gateway, Zigbee only, wired (Ethernet). I also have one BLE-only wireless gateway, and one Zigbee+BLE wireless gateway. For the latter, I sometimes see such sequences as well, but only 2-3 loops, never more. I also have a number of WiFi devices, and no one experiences such misbehaviour.
So, LocalTuya re-initializes not all the devices, but only devices connected to this one particular gateway (bf504372fcc2cc8b10fwrf, 192.168.0.31). Based on this, I'm in doubt that it is HA-related or BLE-related problem...
The problem started to happen in a short period of time (some hours) after I've added 3 new devices to this gateway, and created automations for them. But all was working for some hours.
These new devices are: a dimmer (bffb4b344babe630c4lns9), scene switch (bf0c1d4f7f66ddd4b2lepk) and PIR motion sensor (bfcaf22d28414bacb8cb03). For the dimmer ("tdq"), LocalTuya has no support for its DPs (switch and brightness control), but I added them manually.
Hello, i have to add that only my sub device is unavailable! I have 3 others tuya main devices which are working well.
I've restored core 2024.5.1 from backup, returned back my new devices, and the problem started again.
From my experience, I'm able to stop that loop by disabling or enabling a LocalTuya device. It seems it does not really matter which device. E.g. it can be BLE gateway.
Can you try disabling Sub-Devices only, to see if there will be any changes by doing that or not.
edit: I'll also refactor the setup entry function later, since HA made changes to it in 2024.5
Can you try disabling Sub-Devices only,
Yes, I did it. More over, once I (by mistake) disabled and then enabled only one entity (the switch of the dimmer), and it stopped the loop.
But it is a random process. I just woke up and I see in the log, that some hours ago the loop started again doing it every 1 second, and it stopped by itself after 9 iterations. Then, in around 30 minutes, it started again every 30 seconds, and stopped after 14 iterations. Still it didn't happen.
~I have a thought that it may be caused by "too many" Zigbee devices bound to one gateway. The aforementioned gateway has now 17 active devices, and 3 of them were aded recently. When I revered back to core 2024.5.1, I've lost them, and there was no the problem after this HA restart. But the problem returned back after I added them, along with manual steps to configure DPs of the dimmer and PIR sensor.~ - I disabled more than 3 devices, to no avail.
I changed some messages from debug to info, and added "Sub-device name disconnected" to def disconnected(self)
here:
https://github.com/xZetsubou/hass-localtuya/blob/298d561d72844822f5c24b10574bd4c061588041/custom_components/localtuya/coordinator.py#L455-L459
I also changed HEARTBEAT_SUB_DEVICES_INTERVAL
to 60, and HEARTBEAT_INTERVAL
to 20. After a restart, the problem disappeared in some minutes. Here is the log:
home-assistant_2024-05-14T19-35-07.947Z.log
Please note:
HEARTBEAT_SUB_DEVICES_INTERVAL
). Unfortunately, you don't log disconnect events, and they are missed in full logs I submitted before.Here is an extraction for one device:
2024-05-14 22:08:30.667 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Trying to connect to КЛ: Zigbee Bulb...
2024-05-14 22:08:31.127 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Trying to connect to КЛ: Zigbee Bulb...
2024-05-14 22:08:31.225 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Started heartbeat for КЛ: Zigbee Bulb
2024-05-14 22:08:31.226 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Success: connected to КЛ: Zigbee Bulb
2024-05-14 22:08:31.909 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Sub-device КЛ: Zigbee Bulb disconnected.
2024-05-14 22:08:35.796 INFO (MainThread) [custom_components.localtuya.coordinator] [bf5...whd] Trying to connect to КЛ: Zigbee Bulb...
I'm quite happy that it finally "calmed down" (I had to add one more device today evening, and then couldn't stop this loop) and I can go to sleep.
By the way, I've updated to 2024.5.0b1.
Actually I'm adjusting the way of sub-devices connecting process.
asyncio sleep
above subdevices_query
call in start_sub_devices_heartbeat
at pytuya module.I think this what causing the loop.
even tho it's not connected but the gateway may still cached their status
disconnected()
will be called. This is still on the 1st connecting process
disconnected()
function called localtuya
will immediately re-try to reconnect and that may causes the loop? This is probably why there is 2 attempt of connection at the starts. I've added logging into def _setup_dispatcher(self) -> MessageDispatcher
https://github.com/xZetsubou/hass-localtuya/blob/d64fc6820f0939daae4605d5776f80d7ff21834f/custom_components/localtuya/core/pytuya/__init__.py#L832-L836
and yes: LocalTuya disconnects here! home-assistant_2024-05-15T07-32-30.556Z.log
2. This can be fixed by moving
asyncio sleep
abovesubdevices_query
call instart_sub_devices_heartbeat
at pytuya module.
It does not help to solve this particular problem: LocalTuya continues disconnecting devices. Nor it prevents double connections at the start. Here is the log: home-assistant_2024-05-15T07-40-22.231Z.log
I've changed the condition at line 835 there to
if cid in off_devices:
and finally my system is up and running 😄 home-assistant_2024-05-15T07-47-01.946Z.log
I believe it was a bad logic.
But, with those both fixes, it still connects two times for each device in 1 second.
My opinion is that LocalTuya needs here deep change to its architecture: to create one and the only connection to each gateway, to use it for all the subdevices connected to this gateway. A gateway, as I understand, has limited abilities to serve many connections.
Look to the following two answers from my gateway:
2024-05-15 10:34:41.728 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf5...whd] Sub-Devices States Update: {'reqType': 'subdev_online_stat_report', 'data': {'online': ['a4c138e51d9c71a2', 'a4c13825abcf5819', 'a4c138ba5b2576e5', 'a4c13825798525df', 'a4c138de3eb1f3f3', 'a4c1380ad40d076d', 'a4c1380e8e46d3c4', '00124b002d8e6a68', 'a4c138717b6a9928', '00124b00272127b7', 'a4c138f46f838ee0', 'a4c1389aeca68fbf', 'a4c138a112e22ebe', 'a4c1384bb9cd4e1d', 'a4c1380f8554dc9d'], 'offline': ['a4c1382e58833a9c', 'a4c138c67c8e556e']}}. a4c138b97aac2ae0 disconnected
2024-05-15 10:34:41.983 INFO (MainThread) [custom_components.localtuya.core.pytuya] [bf5...whd] Sub-Devices States Update: {'reqType': 'subdev_online_stat_report', 'data': {'online': ['a4c138c4b14360be', 'a4c138d43c2db604', 'a4c138b97aac2ae0'], 'offline': []}}. a4c1389aeca68fbf disconnected
It seems if a gateway has connected many devices, it reports online/offline statuses of subdevices in a series of messages. I faced the problem exactly when I've added 3 aforementioned new devices (note the second message: exactly 3 cids!). I thought disabling them in LocalTuya would help, but the reason was at the gateway side.
So, checking against 'offline' list is always good.
To check against 'online', it is required to combine several answers into one. But I doubt LocalTuya needs it at all, because if a subdevice was disconnected from the gateway (the only reason to be missed in both lists), LocalTuya would find it out by other means anyway. E.g. by checking against device list from the cloud.
My suggestion is to remove the check against on_devices
as the fix for this problem.
But that double attempt to connect at the start better to fix as well!
One more thought: when you send the subdev_online_stat_report
request to the gateway, it may answer by more than one messages! Now LocalTuya expects only one reply. The second reply is read later, and can be interpreted wrong way.
I'd suggest you to change log messages in _make_connection
to include both name and host, like this:
self.debug(f"Trying to connect to {name} ({self._device_config.host})...", force=True)
self.debug(f"Started heartbeat for {name} ({self._device_config.host})", force=True)
self.debug(f"Success: connected to {name} ({self._device_config.host})", force=True)
to better understand which WiFi devices produce the messages, and which gateway is used for a subdevice. Also for warnings, errors and exceptions.
FYI, with the fix suggested in https://github.com/xZetsubou/hass-localtuya/issues/227#issuecomment-2111825467, it runs very well. I had a device with weak signal, and its offline state was handled correctly by the sub-devices heartbeat several times. So, I believe it was tested enough.
The issue with this approach that if the device has been removed from the gateway it won't shows as "offline" rather it will disappear from both on and off devices.
It is not a big issue! If required, it can be detected from the information from the cloud. The user can delete the device, if (s)he is concerned.
Moreover, I'd not disconnect a sub-device from offline list, especially if it is used as a fake gateway. The reason is because it does not hurt to use an offline device as a fake gateway, but disconnecting it causes disconnecting all other sub-devices, and then it falls into the loop of connecting-disconnecting (I saw it in logs). In brief, the current implementation destroys the functionality for all sub-devices, if the first in the list goes offline.
My opinion is that the online and offilne lists could be used to prevent connections to offline devices until they apper online, but it is not implemented.
The current implementation of subdev_online_stat_report
results looks more harmful than useful for me.
I know but and I'm still looking for better solution if possible, I don't have a lot of sub-devices to test query command on, Does the command always report all sub-devices you have or does it miss some of them sometimes?
Yes, currently the gateway always replies with two messages showed in https://github.com/xZetsubou/hass-localtuya/issues/227#issuecomment-2112020068 in log records with all the devices attached to the gateway. I have two powered off devices (disabled in LocalTuya) which are listed in the 1st message. The two online lists have all my online devices when they are online. Of course, when I temporarily turn off some of them, or in case of weak signal, both messages still show all the devices, with correct division into online and offline.
I presume that with even more devices, the number of messages would be more.
Can you give 2024.5.0b2 a try
Can you give 2024.5.0b2 a try
I updated to it, but changed some log messages from debug to info, for better understanding. It started, and twice re-started, with no problems. Let's see it on the long run.
I see, you didn't implement here TimeoutError
handling in start_sub_devices_heartbeat
😞 I re-implemented it, because, from logs, I found it useful.
The will be no false timeout anymore for sub-devices HB like before, unless the device actually didn't responses at all, so you can try it without adding timeout retries, Matter fact your tests give me more information I don't have the amount of sub-devices like you 😺, so I'll be glad if you tested with normal release
I've turned off a dimmer, and in some seconds received corresponding message in the log. So, the new design detects offline devices, and so far works with no false disconnect events.
Currently, LocalTuya logs are either debug, or empty. I found useful to have more information at info level: https://github.com/xZetsubou/hass-localtuya/blob/e1852fdafc06e6df49b12e2d983a5ee9fed458e0/custom_components/localtuya/coordinator.py#L258 https://github.com/xZetsubou/hass-localtuya/blob/298d561d72844822f5c24b10574bd4c061588041/custom_components/localtuya/core/pytuya/__init__.py#L925
Inserted after this line https://github.com/xZetsubou/hass-localtuya/blob/e1852fdafc06e6df49b12e2d983a5ee9fed458e0/custom_components/localtuya/coordinator.py#L463
if self._node_id:
self.info(f"Disconnected: {self._device_config.name} {self._device_config.host} {self._node_id}")
else:
self.info(f"Disconnected: {self._device_config.name} {self._device_config.host}")
(currently the first message is a warning, that's why if
)
Also one more debug->warning, wrote a comment to the commit.
The will be no false timeout anymore
Yesterday I have had big problem with my internet provider for a whole day. All my automations are local, no direct involving of the cloud. But each device with direct connection to the cloud (Wi-Fi devices and gateways) report their states, or state changes for their sub-devices. When they can't connect or send data to the cloud, they become very unresponsive! Lots of timeouts, lots of Zigbee sub-devices went offline because they failed to inform their gateway (which were just busy), very slow reaction (if any) on events.
So, don't presume that a device either must answer in 5 seconds, or it can't answer at all. It can be busy right now with other tasks.
But I see you've re-wrote it even more. Let's see later. For now, - sorry! - right now I prever to have TimeoutError
handled in this particular place for the sake of robustness, based on my experience. But I'll check the logs to see if timeouts ever happened.
Technically, if there's a timeout error, the loop of subdevices_query will break and will not raise anything! Probably what caused the mass timeout errors for is status command that being called in connect process.
In addition to https://github.com/xZetsubou/hass-localtuya/issues/227#issuecomment-2118742776, I uncommented https://github.com/xZetsubou/hass-localtuya/blob/298d561d72844822f5c24b10574bd4c061588041/custom_components/localtuya/core/pytuya/__init__.py#L292-L293
because instead I could not understand what was happening from the log.
Can you give 2024.5.0b2 a try
In general, it runs very well.
But it happened that I had to restart HA many times, and I noticed that sometimes, right after restart, some entities are in unavailable state. The corresponding devices were online, there were connections to them, but their entities were disabled, and even after some minutes didn't come alive. Another restart always was without this problem. I also saw such a probblem once with 3.2.5.2b7. LocalTuya log level is info (with my corresponding changes to the source code), so that I have nothing to support this report.
FYI I've installed 2024.5.0.b4 a hour ago, patched for more info in the logs, and it started and is running smoothly so far. I tested a sub-device going offline and back online, and it worked as expected.
This issue was closed because it was resolved on the release: []()
LocalTuya Version
3.2.5.2b7
Home Assistant Version
2024.5.3
Environment
What happened?
After some recent changes (moved HA from VM to a dedicated armbian box, added some devices, HA updates to 2024.5.*), a weird problem arises: after HA restart, LocalTuya falls into endless loop of the initial sequence trying to connect to all my devices to obtain initial entities values. During this time, HA becomes useless, because all the devices fall into "Became unavailable" every some seconds.
If I don't touch the system, it may continue running as such for a long time. It seems that adding or reconfiguring a device (also not related to LocalTuya, e.g. via direct BLE integration) stops that loop.
I've applied your #226 fix before the last restart (with the log attached), just for information.
I don't mark this issue as a bug because I have no idea what was the reason that this problem started to happen: whether it was HA update, or newly added devices, or a combination of events...
I attach my current diagnostics of all the devices, and a log which ended when that loop just stopped. Note, that the device "[bf5...wrf]" is disabled (a gateway with 2 DP's). But in the log there are plenty of records
which should only happen once at the very beginning. That's why I call it "endless initialization loop".
Steps to reproduce.
Don't know
Relevant log output
No response
Diagnostics information.
home-assistant.log config_entry-localtuya-944c1616039a1d1ec6b9745179bff74e (5).json