home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.78k stars 30.88k forks source link

All Shelly Motion 2's become unavailable on a regular basis until the service "Home Assistant Core: Reload Config Entry" is called #119002

Closed KruseLuds closed 4 months ago

KruseLuds commented 5 months ago

The problem

Starting with I believe core v. 2024.5.5 all of my Shelly Motion 2 devices (I have 11 of them) become "unavailable". They are all "Hardware: gen1 (SHMOS-02)" and it happens only with them (none of my other Shelly devices). The problem is always resolved by calling the service "Home Assistant Core: Reload Config Entry" for each device that is unavailable. I am currently running a healthy supported version of Home Assistant Supervised with everything up to date as shown below.

What version of Home Assistant Core has the issue?

core-2024.6.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

Shelly

Link to integration documentation on our website

https://www.home-assistant.io/integrations/shelly

Diagnostics information

There is no data to show, only the information already given

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Additional information

System Information version | core-2024.6.0 -- | -- installation_type | Home Assistant Supervised dev | false hassio | true docker | true user | root virtualenv | false python_version | 3.12.2 os_name | Linux os_version | 6.1.0-21-arm64 arch | aarch64 timezone | America/New_York config_dir | /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4845 Installed Version | 1.34.0 Stage | running Available Repositories | 1455 Downloaded Repositories | 28
AccuWeather can_reach_server | ok -- | -- remaining_requests | 18
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Debian GNU/Linux 12 (bookworm) -- | -- update_channel | stable supervisor_version | supervisor-2024.06.0 agent_version | 1.6.0 docker_version | 26.1.4 disk_total | 915.4 GB disk_used | 36.1 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | supervisor_api | ok version_api | ok installed_addons | AdGuard Home (5.1.0), Log Viewer (0.17.0), Home Assistant Google Drive Backup (0.112.1), File editor (5.8.0), Terminal & SSH (9.14.0), Core DNS Override (0.1.1), Matter Server (6.1.0), Cloudflared (5.1.10), Mosquitto broker (6.4.1), Ring-MQTT with Video Streaming (5.6.4)
Dashboards dashboards | 9 -- | -- resources | 20 views | 43 mode | storage
Recorder oldest_recorder_run | May 8, 2024 at 5:30 AM -- | -- current_recorder_run | June 6, 2024 at 12:23 PM estimated_db_size | 4084.03 MiB database_engine | sqlite database_version | 3.44.2
home-assistant[bot] commented 5 months ago

Hey there @balloob, @bieniu, @thecode, @chemelli74, @bdraco, mind taking a look at this issue as it has been labeled with an integration (shelly) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `shelly` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign shelly` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


shelly documentation shelly source (message by IssueLinks)

thecode commented 5 months ago

Please enable Debug for Shelly integration, wait for a device to become unavailable, disable debug and attach the log.

Note: it is better to drag the log into the comment (which will add it as an attachment) and not copy paste as it is hard to read logs in GitHub.

Thanks

KruseLuds commented 5 months ago

(See the bottom most comment from me, that includes the log and what I found in it).

I am not sure if I can do that as turning on that debug logging makes my syslog go absolutelly nuts with many thousands of lines per minute (e.g., "[aioshelly.rpc_device.wsrpc]") as I have 41 WiFi shelly devices acting as passive bluetooth scanners for 11 Shelly BLU motion sensors. I will try however... (Y.I.K.E.S.)

Just as an FYI so you have the details here, I created an automation that reloads the config entries for any of my shelly motion 2 sensors when they become "unavailable" but it would run endlessly as I had it in parallel and 1000 (cuasing some kind of a race condition or loop I guess) but instead changed it to queued 30 and it works like a charm, also I can tell you that in checking traces I noticed:

@ 8:26:59pm it ran for the Bedroom 3 Shelly Motion 2 Sensor @ 8:27:00pm - Den Shelly Motion Sensor @ 8:27:00pm - Bathroom Shelly Motion 2 Sensor @ 8:27:00pm - Basement Stairs Shelly Motion 2 @ 8:27:01pm - Kitchen Main Shelly Motion Sensor

FYI, below yaml syntax is kludgy but works fine - as I wanted to use the trigger ID in the actual command for the reload to get rid of all of those if statements at the bottom but I couldn't get the syntax right. I will post the debug data here tomorrow -

`alias: Any Shelly Motion 2 Becomes Unavailable -> Reload It's Config Entry description: "" trigger:

KruseLuds commented 5 months ago

Here is the HUGE log and I did check the automation which did show that the "Den Shelly Motion Sensor" went offline at 11:47:38pm and the automation reloaded it's config entry and it was then (and is still now) online. Also the details in home assistant about the device show this in the log as well (the red rectangle). Below that I have attached the log file:

image

The IP address for the Den Shelly Motion Sensor (which is a Shelly Motion 2) you should see in the log would be 192.168.10.53.

2024-06-06 23:47:38.589 ERROR (MainThread) [homeassistant.components.shelly] Error fetching Den Shelly Motion 2 data: Sleeping device did not update within 3600 seconds interval

...ended up triggering the automation:

homeassistant.components.automation.any_shelly_motion_2_becomes_unavailable_reload_it_s_config_entry

Pay attention to these times in the log file:

23:47:38.994 23:47:38.995 23:47:38.996 23:50:00.047

I had to remove a gazillion lines from the file so I got it from 457MB to 14MB so it now fits in 25MB upload limit):

[Uploading home-assistant_shelly_2024-06-07T04-07-00.181Z.log…]()

Thank you for your help, I look forward to hearing back (your fix will not make me disable that automation though, it is good insurance for me :-) )

KruseLuds commented 5 months ago

FYI my automation still makes things go berserk sometimes, I had to disable it. So, HALP!

rhoddan commented 5 months ago

I have similar problem. Almost every day the Shellies became unavailable. That also affected my energy statistic (VERY annoying). It happens exactly same time stamps. I have tested different firmwares as well IMG_0514 IMG_0515

Eisbaer2 commented 5 months ago

Same here with a shelly pro 3EM. In the shelly app it is abailable all the time.

thecode commented 5 months ago

Same here with a shelly pro 3EM. In the shelly app it is abailable all the time.

Shelly Pro 3EM is using a different protocol and can't be the same, please create a new issue with logs. Thanks

chemelli74 commented 5 months ago

as I have 41 WiFi shelly devices acting as passive bluetooth scanners for 11 Shelly BLU motion sensors.

Out of curiosity, do you have only motion sensors as gen1 devices ?

smarthomefamilyverrips commented 5 months ago

This is happening since 2024.5 already #116948

KruseLuds commented 5 months ago

as I have 41 WiFi shelly devices acting as passive bluetooth scanners for 11 Shelly BLU motion sensors.

Out of curiosity, do you have only motion sensors as gen1 devices ?

No, I have almost 60 shelly devices, they are Gen 1 and Gen 2. The only one device type experiencing this issue is the Shelly Motion 2's (which are Gen 1: Hardware: gen1 (SHMOS-02)). There are the kinds of devices I have:

Shelly Motion 2 (12) Shelly Plus Plug (8) Shelly +1 (11) Shelly 1L (7) Shelly Pro 3EM (1) Shelly +2PM (1) Shelly BLU Motion (14) Shelly Dimmer 2 (3)

So in fact, every Shelly Motion device I have is Gen 2 - and FYI they are updated with latest released production (non-beta) firmware.

KruseLuds commented 5 months ago

This is happening since 2024.5 already #116948

This might not be the same issue that I am having however, as I did in this thread post the list of my devices and counts for them, and of that list of device types this issue is only happening with the Shelly Motion 2's (it may include affecting one of your device types that I do not have however of course).

smarthomefamilyverrips commented 5 months ago

This is happening since 2024.5 already #116948

Tuis might bot be the same issue that I am having however, as I did in this thread post the list of my devices and counts for them, and of that list of device types this issue is only happening with the Shelly Motion 2's (it may include one of your device types that I do not have however of course).

I only have Shelly Motions

Eisbaer2 commented 5 months ago

My Shelly Pro 3EM becomes available and unavailable without any manual actions of me. "It just happens" But it is available all the time in the shelly app with fresh data.

chemelli74 commented 5 months ago

No, I have almost 60 shelly devices, they are Gen 1 and Gen 2. The only one device type experiencing this issue is the Shelly Motion 2's (which are Gen 2).

image

Shelly Motion 2 (12) Shelly Plus Plug (8) Shelly +1 (11) Shelly 1L (7) Shelly Pro 3EM (1) Shelly +2PM (1) Shelly BLU Motion (14) Shelly Dimmer 2 (3)

so you have 3 types of gen1 devices:

KruseLuds commented 5 months ago

Yes @chemelli74 I stand corrected, all of my Shelly Motion 2's ARE Gen 1:

Hardware: gen1 (SHMOS-02)

bieniu commented 5 months ago

We suspect that the problem may be caused by blocking the event loop by another integration (probably custom one). The CoIoT packet with status reaches the HA server but cannot be processed correctly. To check this, please enable HA built-in debug mode, restart HA and attach here the log file after few hours.

mstefany commented 5 months ago

I am having the same issue, since upgrading to 2024.6.x all my Shelly Motion devices go regularly offline, only reloading them helps. Also, Shelly Smoke devices constantly report expired credentials, which is probably another issue with Shelly integration. 😭

KruseLuds commented 5 months ago

@bieniu I already attached the log, what is the status?

bieniu commented 5 months ago

Where is the log?

tbclark3 commented 4 months ago

I also have multiple Shelly devices, and I have (I think) the same issue with the motion 2 but not with the others. In my case, the motion 2 starts going offline shortly after restarting HA and remains unstable for several hours, requiring multiple integration reloads. However, after a few hours, the motion 2 stabilizes and remains reliable until the next restart of HA.

rhoddan commented 4 months ago

home-assistant 5.log

I also have multiple Shelly devices, and I have (I think) the same issue with the motion 2 but not with the others. In my case, the motion 2 starts going offline shortly after restarting HA and remains unstable for several hours, requiring multiple integration reloads. However, after a few hours, the motion 2 stabilizes and remains reliable until the next restart of HA.

Yes this drives me crazy. I have similar with Shelly 3EM. Do you have Unifi wifi?

tbclark3 commented 4 months ago

Yes, I have Unifi WiFi, but that hasn't changed recently. Like the OP, my issue with the Shelly integration started in May, although I think it was earlier than 2024.5.5.

smarthomefamilyverrips commented 4 months ago

Yes, I have Unifi WiFi, but that hasn't changed recently. Like the OP, my issue with the Shelly integration started in May, although I think it was earlier than 2024.5.5.

I have shelly motions doing the same and not have Unifi, I am using Asus ZenWifi routers so I doubt that is the problem. Besides that the devices stay connected to Wi-Fi and are reachable through IP in web browser.

KruseLuds commented 4 months ago

Where is the log?

Scroll up, I uploaded the file! It looks like this:

Screenshot_20240622_165537_Chrome

bieniu commented 4 months ago

First, the link is broken. Besides that I asked about the log with enabled the asyncio debug mode. I doubt you would post such a log before I asked for it.

anybody84 commented 4 months ago

@bieniu, I have the same issue. I just enabled HA debug logging and this is the log file: home-assistant_2024-06-24T11-28-57.878Z.log

I have 5 Shelly Motion2 devices and they randomly become unavailable, but usually couple of them almost at the same time. Like in the attached log file:

2024-06-24 13:22:20.123 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shelly_ruch_jadalnia data: Sleeping device did not update within 3600 seconds interval
2024-06-24 13:22:21.211 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shelly_kuchnia_ruch data: Sleeping device did not update within 3600 seconds interval
2024-06-24 13:23:52.282 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shelly_parter_hall_ruch data: Sleeping device did not update within 3600 seconds interval

Also, please find the diagnostic logs for one of those devices: config_entry-shelly-78d71ea582fecd1dbf26fe814675ee08.json

From my perspective, I don't see any integrations that could block the event loop around the time the devices became unavailable. In my case there is one custom integration (smartir) that can block the event loop (see the log file), but I think the only time it reads files using with open() is while setting up (it gets the IR codes from configuration files). And looking at logs it happened at 12:10, then everything was just fine and then, at 13:22 3 devices became unavailable.

bieniu commented 4 months ago

@anybody84 As far as I know, blitzortung also blocks the event loop. Could you test with HA in safe mode (disabled all custom integrations)?

obraz

And show us a screenshot of the unicast configuration for Motion 2.

anybody84 commented 4 months ago

@bieniu, is this what you meant?

unicast_conf

I just restarted HA in safe mode. I will post the log file separately, when it happens again (later today, I guess).

bieniu commented 4 months ago

is this what you meant?

Yes. I assume that this hidden IP address is the address of your HA server.

anybody84 commented 4 months ago

Yes, of course. IP address of the host machine HA is running on.

cs224 commented 4 months ago

I have the same issue. I have many different kinds of shellys and only my shelly motion 2 devices are affected. The description of the problem that @KruseLuds provides matches very much my observations. I did not change my home assistant set-up for quite some time, but I keep home assistant up to date. The issue started to appear in the past few weeks, e.g. with home assistant 2024.5 or 2024.6

smarthomefamilyverrips commented 4 months ago

@anybody84 As far as I know, blitzortung also blocks the event loop. Could you test with HA in safe mode (disabled all custom integrations)?

obraz

And show us a screenshot of the unicast configuration for Motion 2.

No blitzortung usage here and no Unifi here and having exactly the same

rhoddan commented 4 months ago

@anybody84 As far as I know, blitzortung also blocks the event loop. Could you test with HA in safe mode (disabled all custom integrations)? obraz And show us a screenshot of the unicast configuration for Motion 2.

No blitzortung usage here and no Unifi here and having exactly the same

I’m not using blitzortung but still have the issue…

bieniu commented 4 months ago

@rhoddan @smarthomefamilyverrips Your comments do not contribute anything to this discussion. Share logs with safe mode enabled or HA built-in debug mode enabled if you want to help.

smarthomefamilyverrips commented 4 months ago

@rhoddan @smarthomefamilyverrips Your comments do not contribute anything to this discussion. Share logs with safe mode enabled or HA built-in debug mode enabled if you want to help.

@bieniu It should help you because now you know that it is not related to "blitzortung" and also that Unifi Wifi not is causing it as some coments possible suggested. But sure if you prefer to spend your time following unlogical explanations then sorry that we try to help with what we can.... anyway this is already going on from the 2024.5 (see other issue mentioned) updates and was never a issue before, for now just a one time reload of the integration for the motion sensors after a HA restart solves the problem, so I will keep using this work around in a automation and not "bother" you anymore with "in your eyes useless" information to try to help in ways we are able to for our personal situations (maybe not all of us are in possibility to share logs, guess this thought never occurred to you)

anybody84 commented 4 months ago

@bieniu, just one more observation. I have multiple battery-powered Shelly devices like Shelly Motion 2, Shelly Button 1 and Shelly Flood. All devices are configured using CoIoT protocol in the exact same way (I always copy-paste it from one device to another). But I noticed that Shelly Button 1 and Shelly Flood devices work correctly and they don't become unavailable. Somehow, the problem seems to be related only to Shelly Motion 2 devices.

For the record, this is the CoIoT configuration for my Shelly Button 1 device:

image

rhoddan commented 4 months ago

@bieniu, just one more observation. I have multiple battery-powered Shelly devices like Shelly Motion 2, Shelly Button 1 and Shelly Flood. All devices are configured using CoIoT protocol in the exact same way (I always copy-paste it from one device to another). But I noticed that Shelly Button 1 and Shelly Flood devices work correctly and they don't become unavailable. Somehow, the problem seems to be related only to Shelly Motion 2 devices.

For the record, this is the CoIoT configuration for my Shelly Button 1 device:

image

This is my setting for my 3EM (not battery powered)

Skärmavbild 2024-06-26 kl  09 19 45
diezjavier commented 4 months ago

Another observation on my part. Since HA version 2024.6, I have also had a noticeable number of interruptions on my Shelly Motion 2. My WIFI hardware is Unifi. I have explicitly switched off the 5 GHz WIFI for my IOT network and operate it exclusively with the 2.4 GHz. Since the changeover, I have noticed no more interruptions and greater stability of the connection on my Shelly Motion 2. This is probably not the cause of the error, but could help as a workaround :-)

bieniu commented 4 months ago

Firmware 2.2.4 has just been released for Motion/Motion 2 and TRV. The only point in the changelog is "Update WF200 firmware to a possible fix for powersave issue". Please update the firmware and report if it somehow helps with this problem.

tbclark3 commented 4 months ago

The firmware upgrade does not fix this issue.

Also, I would like to point out that it is clear that the issue is not related to Wi-Fi--not the brand, eg Unifi, and not the frequency, eg turning off 5 ghz. At the time the device becomes unavailable, and throughout the time it is unavailable, it is still accessible directly by web browser, and it shows no errors.

chemelli74 commented 4 months ago

Can you run tcpdump on HA server ? If so please do:

KruseLuds commented 4 months ago

Firmware 2.2.4 has just been released for Motion/Motion 2 and TRV. The only point in the changelog is "Update WF200 firmware to a possible fix for powersave issue". Please update the firmware and report if it somehow helps with this problem.

Giving it a shot -

thecode commented 4 months ago

Please create new issue if you still experience problems after updating to core 2024.7.0 (in beta now and released next week). Make sure to provide diagnostics and logs as explained in https://github.com/home-assistant/core/issues/119002#issuecomment-2153189138

KruseLuds commented 4 months ago

Please create new issue if you still experience problems after updating to core 2024.7.0 (in beta now and released next week). Make sure to provide diagnostics and logs as explained in #119002 (comment)

This is still an issue, when I turn on the debugging my logs are lilke multiple gigs and then I can't upload them and struggle to whittle them down and then the upload to there for them fails. Please don't just sweeep this under the rug if others are saying they still have the issue as well.

thecode commented 4 months ago

This is still an issue, when I turn on the debugging my logs are lilke multiple gigs and then I can't upload them and struggle to whittle them down and then the upload to there for them fails. Please don't just sweeep this under the rug if others are saying they still have the issue as well.

No one is sweeping anything, we added some logging which will show even without enabling debug logging (to overcome this problem) . However without someone providing logs from core 2024.7.x there is nothing we can progress. Home assistant 2024.7 is in beta now, so fixes are not added to 2024.6.x.

You can update to latest beta now and create a new issue with data from 2024.7 or wait for it to be released next week, but as soon as someone provide new data we can try to investigate the problem.

KruseLuds commented 4 months ago

This is still an issue, when I turn on the debugging my logs are lilke multiple gigs and then I can't upload them and struggle to whittle them down and then the upload to there for them fails. Please don't just sweeep this under the rug if others are saying they still have the issue as well.

No one is sweeping anything, we added some logging which will show even without enabling debug logging (to overcome this problem) . However without someone providing logs from core 2024.7.x there is nothing we can progress. Home assistant 2024.7 is in beta now, so fixes are not added to 2024.6.x.

You can update to latest beta now and create a new issue with data from 2024.7 or wait for it to be released next week, but as soon as someone provide new data we can try to investigate the problem.

I apologize for implying anyone might sweep anything under the rug - I will wait until 2O24 .7 next week, thank you!

KruseLuds commented 4 months ago

This problem seems to have been resolved (there is a different issue now, I will log a separate ticket.)

smarthomefamilyverrips commented 4 months ago

This problem seems to have been resolved (there is a different issue now, I will log a separate ticket.)

@KruseLuds on 2024.7.x? And what issue appears now?