leeyuentuen / localtuya

local handling for Tuya devices
GNU General Public License v3.0
72 stars 17 forks source link

Devices become unavailable randomly #22

Closed Anycubic closed 1 year ago

Anycubic commented 1 year ago

Suddenly and apparently randomly devices become unavailable forever. I have 6 TRV valves connected to a ZIgbee Gateway, in Tuya they work fine. Reloading Local Tuya does not help, to fix it I have to restart Home Assistant. Unfortunately after a while the same thing happens.

Environment Localtuya version: 3.6.1 Last working localtuya version (if known and relevant): 3.5.3 Home Assistant Core version: 2022.12 Are you using the Home Assistant Tuya Cloud component ? no Are you using the Tuya App in parallel ? Just for reading devices status

home-assistant_2022-12-21T21-35-32.480Z copy.txt

leeyuentuen commented 1 year ago

@Anycubic , here's the automation YAML Automation code Interesting behavior with 3.6.4 - i haven't done the actual update, but it sounds very strange, like the config entries get corrupted. I'll give it a try hopefully later today.

@leeyuentuen @alexualbu this issue is still there in 3.6.5 beta 2. DPS ID are populated only after they have been "touched" at least once from Smart Life app, otherwise they will not appear in the config flow.

only for me the sensor, it will get the data after there is an update

alexualbu commented 1 year ago

So @leeyuentuen , @Anycubic - i finally got some time to look into this and I think I found the issues:

Right now these changes live on my fork. I'll run them for a few days to validate they work fine. Also, @Anycubic, I am not sure what issue you had with the 2nd gateway, but I connected the other gateway I had and now I have both running with a bunch of devices.

alexualbu commented 1 year ago

I also don't know whether it is better to open a new issue, with 3.6.4, when adding new devices (yes, I had to start again from scratch, upgraded to 3.6.4 and all of the devices were unavailable with no solution...) so when adding new devices recognised DPS are very erratic: all devices are the same, but some of them are with all of the DPSs, other with just 2 or 3 of them. I never saw such a behaviour before

Regarding this, @Anycubic, it seems to be a very nasty feature of these TRVs - if a DP is not touched for a while (not sure how long) it will not be reported in the status so not discovered. I double checked with the tuyaapi-cli (so a js implementation). This is unfortunate when you add a new device, but it gets more annoying after a HA restart or reconnect because those DPs not getting reported means the entities will appear as unavailable. I have a suspicion that this is fixed with the force status dispatch after a reconnect and would work with a state restore for a HA restart (haven't tried the latter yet, but at first sight we don't have that implemented)

leeyuentuen commented 1 year ago

So @leeyuentuen , @Anycubic - i finally got some time to look into this and I think I found the issues:

  • there were still some pieces of code that called _get_subdevice_status with subitem instead of cid -> this would've caused the devices to not have an initial status after reconnection and also only the first device would get a reconnect event because an exception would be thrown when attempting to get it's status
  • I forced a status update from the device (to the entities I presume) when it gets a reconnect event from the gateway (@leeyuentuen , I am not sure about this one because I haven't fully grasped how the entities should get their status after a devices reconnects, but it would make sense since a LocalTuyaSubdevice will not go through the default connect flow again like a LocalTuyaDevice, but rather just gets the connected event from the gw)

Right now these changes live on my fork. I'll run them for a few days to validate they work fine. Also, @Anycubic, I am not sure what issue you had with the 2nd gateway, but I connected the other gateway I had and now I have both running with a bunch of devices.

great work! if you tested and it work. you can create here an merge request. i can approve them

alexualbu commented 1 year ago

great work! if you tested and it work. you can create here an merge request. i can approve them

Assuming everything is in order which branch do you want me to open the PR against?

leeyuentuen commented 1 year ago

great work! if you tested and it work. you can create here an merge request. i can approve them

Assuming everything is in order which branch do you want me to open the PR against?

yes

Anycubic commented 1 year ago

@leeyuentuen @alexualbu I'll test the latest release when it will be available and report back on all of the issues I had so far

leeyuentuen commented 1 year ago

i've create an new beta tag version: https://github.com/leeyuentuen/localtuya/releases/tag/v3.6.6

alexualbu commented 1 year ago

thanks, @leeyuentuen. @Anycubic , I already found an issue - sometimes the gateway does not send the connected event to all the subdevices - I am not clear way (I can't see any error being thrown) - so let me know if you see that (it should be obvious because only some of the subdevices would not have all the entities available and if you reload those they should get their status back.

alexualbu commented 1 year ago

@leeyuentuen, i have been running the fixes for a few days now and I m comfortable they resolve most of the issues around this (haven't been able to replicate the one above yet, but put in more logging). @Anycubic, let us know if you managed to test and found something else.

Since my 3.6.4-alt branch was used as a test bed and i combined both commits from 3.6.5 and my pieces, how would you like to proceed? I think it s better if we clean those commits up and I base the work off of 3.6.5. Do you want to create a 3.6.6-beta2 branch or go about it differently?

leeyuentuen commented 1 year ago

new beta branch create https://github.com/leeyuentuen/localtuya/releases/tag/v3.6.6-beta.2

Anycubic commented 1 year ago

@leeyuentuen, i have been running the fixes for a few days now and I m comfortable they resolve most of the issues around this (haven't been able to replicate the one above yet, but put in more logging). @Anycubic, let us know if you managed to test and found something else.

Since my 3.6.4-alt branch was used as a test bed and i combined both commits from 3.6.5 and my pieces, how would you like to proceed? I think it s better if we clean those commits up and I base the work off of 3.6.5. Do you want to create a 3.6.6-beta2 branch or go about it differently?

@alexualbu I just installed 3.6.6-beta2, I'll test it in the next few days

Anycubic commented 1 year ago

@alexualbu I confirm that the 2nd gateway issue is now fixed 😉

alexualbu commented 1 year ago

@Anycubic , any other issues with subdevices disconnecting? can you look at the history of the various entities and see if you've had any blips in availability (i.e. the whole gw disconnect - reconnect flow worked ok) ?

Anycubic commented 1 year ago

@Anycubic , any other issues with subdevices disconnecting? can you look at the history of the various entities and see if you've had any blips in availability (i.e. the whole gw disconnect - reconnect flow worked ok) ?

@alexualbu unfortunately I always have this big issue after power outages: devices are unavailable until "woke up" from Smart Life. This is a no go for me because if it happens during night hours my automation will not work early in the morning....

alexualbu commented 1 year ago

@Anycubic, so does it happen when the GW goes offline or everything including HA? And is it only for the TRVs pr other devices as well? Also, is itfor short power cycles (e.g. few minutes) or longer (how long?) time offline?

Anycubic commented 1 year ago

@Anycubic, so does it happen when the GW goes offline or everything including HA? And is it only for the TRVs pr other devices as well? Also, is itfor short power cycles (e.g. few minutes) or longer (how long?) time offline?

@alexualbu only when GW goes off line. I have TRVs only and it happens whatever time the power outage is taking. When GW comes back on line on Smart Life TRVs are available right away.

alexualbu commented 1 year ago

Hmm, that's interesting. The initial testing I did was exactly by unplugging the GW and it was fine. I ll try that again.

alexualbu commented 1 year ago

@Anycubic , i just tried it now - unplugged the GW and plugged it back in. All subdevices became unavailable and then recovered after the GW came back online. Can you try a powercycle and provide the debug logs?

Anycubic commented 1 year ago

@Anycubic , i just tried it now - unplugged the GW and plugged it back in. All subdevices became unavailable and then recovered after the GW came back online. Can you try a powercycle and provide the debug logs?

@alexualbu online for me doesn't mean they are working from HA. I'm also having mixed situation, devices unavailable and devices which seemed available but in a zombie state until activated from Smart Life app. Btw I'll test GW again tomorrow

alexualbu commented 1 year ago

@alexualbu online for me doesn't mean they are working from HA

@Anycubic , I am not sure what you mean by this ^ I am not sure my TRVs ever ended up in that zombie state - they're either available and respond, or unavailable and come back after I touched them in Smart Life.

Anycubic commented 1 year ago

@alexualbu I just tried 3.6.6-beta3. The good news is that power cycling the GWs now works as intended. The bad news is that adding a device with a 2nd GW (actually didn't test with my 1st GW) gives me an error with this log:

Logger: custom_components.localtuya.config_flow
Source: custom_components/localtuya/config_flow.py:500
Integration: LocalTuya (documentationissues)
First occurred: 1:22:34 PM (1 occurrences)
Last logged: 1:22:34 PM Unexpected exception Traceback (most recent call last): File "/config/custom_components/localtuya/config_flow.py", line 451, in async_step_basic_sub_device_info return await self.async_step_pick_entity_type() File "/config/custom_components/localtuya/config_flow.py", line 500, in async_step_pick_entity_type if self.platform is not None or self.basic_info[CONF_IS_GATEWAY]: KeyError: 'is_gateway'

alexualbu commented 1 year ago

@Anycubic, great news! @leeyuentuen, I opened a new PR to fix the fix for #31 - based on the above confirmation i think this could be the last 3.6.6 beta

Anycubic commented 1 year ago

@alexualbu @leeyuentuen it seems to me 3.6.6-beta4 fixed both of my issues, devices plugged to 2nd GW and power cycling GWs. Congrats and thank you! Time will tell if everything is actually sorted out, but so far so good 🥳

leeyuentuen commented 1 year ago

if no more comments in weekend, i'll put the beta in release

leeyuentuen commented 1 year ago

release is done, I expect the problem is gone so I'll also close this issue