rospogrigio / localtuya

local handling for Tuya devices
GNU General Public License v3.0
2.82k stars 544 forks source link

Lepro / LampUX bulbs going unavailable - seemingly randomly #461

Open johnstegeman opened 3 years ago

johnstegeman commented 3 years ago

Hello everyone,

I bought some really cheap bulbs on Amazon - they are branded Lepro. This issue that I'm reporting happens both with the RGBW as well as the dimmable white bulbs. Notably, I was at one point trying Homebridge before I moved to HA, and this same issue happened over there.

What happens: Simple, really. At seemingly random times, the bulb is shown as "became unavailable" in HA. After a short period of time (seconds to maybe a minute or two), the bulb becomes available again. I had this issue, as I say, with Homebridge trying to control the bulbs as well. After reading through some issues here, I decided to try isolating the bulbs from the Internet. At my router, I gave one of the bulbs a fake DNS server (there is nothing at that IP, so DNS never responds) as well as blocked all traffic outside the local LAN. I have the same issue. I know it's working as intended because at my router I can see all connections - that bulb's DNS's status is NO RESPONSE, and all of the other bulbs that I didn't touch do their DNS and then connect over port 8886 to servers in AWS for their control channel - the one bulb does not.

I know this isn't an issue with my local network per se - I have 16 of the Lepro bulbs - all of them at one time or another (multiple times a day) exhibit this behavior. I have one Merkury bulb from Walmart, and it has been rock solid - never went unavailable even one time. WiFi signal is good, and several of the bulbs that are going unavailable are in the same room as a WiFi access point, so I don't believe it's that.

I doubted it would work, but I tried OTA flashing one of the bulbs, and as expected, it has the new firmware that blocks it. I don't have access to the hardware I'd need to flash them via serial.

Has anyone had any experience with these bulbs? If they end up just being cheap junk, it's not the end of the world because they are so cheap I can just replace them with something better in the future. I expect that it's probably not related to this integration in particular because I also had the same issue with local tuya integration in HB. I could, I suppose, go back to the web/cloud-based integration, but I'd rather not.

Edit: I'm wondering if this might be something on the board itself - maybe something that puts the unit to sleep and wakes it up on a regular cycle?

Here's what the HA log shows:

2021-04-26 14:28:45 ERROR (MainThread) [custom_components.localtuya.common] [eb8...htj] Connect to x.x.x.x failed Traceback (most recent call last): File "/config/custom_components/localtuya/common.py", line 139, in _make_connection self._interface = await pytuya.connect( File "/config/customcomponents/localtuya/pytuya/init.py", line 637, in connect , protocol = await loop.create_connection( File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1025, in create_connection raise exceptions[0] File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1010, in create_connection sock = await self._connect_sock( File "/usr/local/lib/python3.8/asyncio/base_events.py", line 924, in _connect_sock await self.sock_connect(sock, address) File "/usr/local/lib/python3.8/asyncio/selector_events.py", line 496, in sock_connect return await fut File "/usr/local/lib/python3.8/asyncio/selector_events.py", line 528, in _sock_connect_cb raise OSError(err, f'Connect call failed {address}') TimeoutError: [Errno 110] Connect call failed ('192.168.100.233', 6668)

svkowalski commented 3 years ago

I'm seeing the same behavior with a totally different device integrated into LocalTuya: an InkBird C929 thermostat. I have several sensors configured, and they all report correct values (Current temp, Temp Hi/Low Min/Max, Power state, socket state), but go unavaiable randomly for a few seconds, sometimes longer. I was running the current version 3.2.2, so I backed off to earlier versions. The problem still occurs, but the unavailable behavior doesn't persist quite as long. I'm in the process of documenting my configuration & posting it, along with some errors that appear in the log. I only noticed this problem recently (since 3.1.0?). Haven't narrowed it down yet... I need to add that LocalTuya is a big improvement over the standard Tuya integration. I could never get correct readings from this device using that Integration.

johnstegeman commented 3 years ago

It's very strange behavior. I am noticing in two cases (different rooms in the house), I have two of the same bulbs in the same room (one room is an RGBW and one room is dimmable white). In each of those rooms, one of the bulbs goes unavailable several times an hour, and the other one much less frequently. I'm happy to help debug and even fix this issue if it's not related to the hardware, but I'm new to all of this (new to HA and tuya, not new to coding), and don't know where would be a good place to start.

johnstegeman commented 3 years ago

I've been thinking more about this... If it's a hardware-related kind of issue, perhaps a way to work around it would be to assume a device that is not responding is available until after it has been unavailable for some length of time. If HA sends any control to the device during this time, it could use an async kind of task to keep trying until the device actually comes online. As I say, I know next to nothing about tuya or HA architecture, but if I were to design a system to work with an unreliable device, I might do something like that...

johnstegeman commented 3 years ago

Perhaps using https://gist.github.com/ultrafunkamsterdam/db2a0ff6d4ea189b893b9d24374f33e0 to auto-retry the connection instead of failing immediately... specifically on common.py line 139. I will see if I can get some time to experiment.

User8283 commented 3 years ago

Socket Hyper Iot P01 - the same problem. The socket becomes unavailable for a few seconds several times per hour.

johnstegeman commented 3 years ago

My bulbs go offline for 1-2 minutes at a time.

johnstegeman commented 3 years ago

Some more updates from me. I tried disabling ipv6 on the HA box based on some research I did on asyncio in Python. No real change happened. I also went through my log files in detail. I have 16 of these kinds of bulbs (some white/some RGBW). There are some devices that never go unavailable (or at least not in the hour I looked at) and others that go unavailable much more often than others. I'm going to try swapping pairs of bulbs to see if it might be related to the fixture or location.

johnstegeman commented 3 years ago

Unfortunately, not much to say today. The error happened about 15,000 times in 5 days. All of them related to the 16 Lepro bulbs and not 1 related to the other brand of bulb, so that makes me think it's something to do with that bulb's hardware/firmware. Not much I can do about that unless I want to crack open the bulb and flash it.

agittins commented 3 years ago

What firmware version is running on the bulbs themselves (you can find out in the smartlife app)? I'm having trouble with two bulbs (Mirabella Genio I002607 V0002) running tuya version 3.3.30 that sounds just like what you're seeing.

My other bulbs are older so I had already tuya-converted them to esphome and they run just fine, but these new ones have forced me to try localtuya.

What I'm seeing is that sometimes the bulbs connect ok and work, and other times they just don't appear on the network - both localtuya and the smartlife app show them as being offline/unavailable.

Looking at network traffic, it looks like the bulb sends DHCPDISCOVER requests, my server responds with a DHCPOFFER, but the bulb never responds with a DHCPREQUEST for the offered IP. It might be that the bulb is failing to receive any traffic, but as you note, it's intermittent (for me it works very rarely, and usually fails).

It does prove that the bulb is joining the wifi network, but then things go awry. My guess is it's a firmware bug, hence why you're seeing the same behaviour on both HA and homebridge, and me in HA and smartlife.

This is why I HATE having to rely on third-party firmware :-(

So.. what version of the tuya firmware is running on your good bulbs and on your bad bulbs?

agittins commented 3 years ago

Quick update - it looks like my issues might be signal-strength related. All of my esphome units seem to be fine at this location, but the ones with tuya firmware are a bit borderline from my office - moving the bulbs closer to the access point appears to be solving the issue :-/ Pinging the units was showing periodic packet loss (although older bulbs of the same model in the same location with esphome on them ping reliably).

Not sure if there's a hardware change that came along with the new version of bulb or if the tuya firmware is doing less well under marginal reception.

johnstegeman commented 3 years ago

For me - it's definitely that brand of bulb. I put a couple more Merkury bulbs from Walmart in my setup the other day. They have not gone unavailable even a single time (and they are farther from the WiFi access point than other Lepro bulbs that go unavailable all the time).

krisnoble commented 3 years ago

Hi, I'm having similar issues with my Lepro bulbs/led strip and Teckin plugs. They seem to only be unavailable for a few seconds at a time though, rather than minutes.

I've disabled all the corresponding devices on the main Tuya integration although they're still set up in Smart Life. Had no issues with reliability in Smart Life - other than the Tuya outage recently which is what prompted me to explore local control in the first place :)

The behaviour doesn't seem to be affected by distance given that all my devices are having the same problem at varying distances from the router.

I've tried setting the logs to debug but that results in a huge log and I don't really know what to look for. I set up an automation to create a persistent notification whenever one of those entities went unavailable and got about 800 in 24 hours.

My main worry is that it could cause automations to fail if an entity happens to be unavailable at the precise time, but I suppose I could add a wait condition to make sure the entity is available before proceeding.