home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.71k stars 30.84k forks source link

Google Nest Integration causing restart failure #110735

Closed mjpsr11 closed 1 month ago

mjpsr11 commented 9 months ago

The problem

Core restart fails when Google Nest integration has more than one entry and each entry uses a separate Google account and credentials.

What version of Home Assistant Core has the issue?

core-2024.2.1

What was the last working version of Home Assistant Core?

core-2024.2.1

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Google Nest

Link to integration documentation on our website

https://www.home-assistant.io/integrations/nest/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

Log files don't provide any clues of failure. The restart issue is repeatable when additional entries with separate Google accounts are used. When the additional entries are deleted (or manually deleted from core.config_entries), restart is successful.

The final GUI restart message when failed (GUI message freezes, but supervisor/core is still active) is mostly random, but seems to land on loading Google Assistant integration more often than not. The result is any integration that has not yet loaded is never loaded, as if restart is stuck in a loop or code execution has ended.

home-assistant[bot] commented 9 months ago

Hey there @allenporter, mind taking a look at this issue as it has been labeled with an integration (nest) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `nest` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign nest` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


nest documentation nest source (message by IssueLinks)

allenporter commented 9 months ago

Hi, are you able to get diagnostics? https://www.home-assistant.io/docs/configuration/troubleshooting/#debug-logs-and-diagnostics also enabling debug logging might be helpful as well. It may be needed to first enable debug logging when integration #1 is added, then add the second one.

allenporter commented 9 months ago

(My only thought I can think of is if somehow running two google cloud pubsub subscribers is causing an issue where they accidentally are sharing state or a thread or causing a deadlock on startup of the second integration. the debug logs would show if this is the case)

mjpsr11 commented 9 months ago

Log file with debug enabled for Google Nest integration attached as requested. Unfortunately, as mentioned before there are no entries for this integration. I've also attached a snapshot to show where restart progress messages freeze. No dashboards or add-ons are available.

At this point, I have to manually remove the additional integration entries (homes) from the core.config_entries file, restart, then add them back again.

home-assistant.log

Screenshot 2024-02-16 125428

allenporter commented 9 months ago

To be more specific I'm also looking for diagnostics as well. Can you attach that?

home-assistant[bot] commented 9 months ago

Hey there @home-assistant/cloud, mind taking a look at this issue as it has been labeled with an integration (google_assistant) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `google_assistant` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign google_assistant` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


google_assistant documentation google_assistant source (message by IssueLinks)

allenporter commented 9 months ago

Ah from the screenshot "google assistant" is a different integration than the nest integration? So i think its a different integration we're talking about here.

mjpsr11 commented 8 months ago

Sorry, not sure what 'diagnostics' means. Is that a file?

The screenshot is just to show that it stops the restart process at some point. Does not necessarily mean that the Google Assistant integration is the root cause (however, they may be related since they both use Google project credentials).

The repeatable solution is to remove/delete the additional homes from the Nest integration (as I stated above) before restarting.

allenporter commented 8 months ago

See "download diagnostics" at https://www.home-assistant.io/docs/configuration/troubleshooting/#debug-logs-and-diagnostics (which i posted above in https://github.com/home-assistant/core/issues/110735#issuecomment-1948887680)

mjpsr11 commented 8 months ago

Okay, I can download diagnostics, but only after I manually add the integration. When the error occurs, the integration page is not accessible.

This is from one of the additional integration entries that was added manually:

config_entry-nest-665d2586d1e59f1c8a90a41a919462fe.json

allenporter commented 8 months ago

Thanks, that is a good start.

Question (1): How long was this instance up before you collected diagnostics? The statistics say that there were 14k messages received since it started which seems incredibly high, so helpful to have a little context there

Lets complete more aspects of the original question template about logs: Request (2): For the 1 integration, is there anything useful or interesting in the logs? debug logs? Request (3): Can you turn on debug logging, then add the second integration, then see the difference in the logs when adding the second integration?

Lastly, one other question. Request (4): Can you reproduce this behavior when in safe mode?

Appreciate all the effort here, it will go a long way towards being able to solve this problem. Thanks!

mjpsr11 commented 8 months ago

Had to restart this morning (not sure what caused it, but GUI was frozen).

  1. Yes, the instance for the diagnostic file just sent was pretty old. I don't know how old since the restart cleared the log file.

  2. Here is the log file with debugging on just before adding the second home: home-assistant.log.1.log (Based on the timesof debugging info in the log file, I'm now assuming that debugging starts as soon as it is selected instead of after a restart. Can you confirm?)

  3. Here is the Nest diagnostic file with only one home. config_entry-nest-3ab878040e721c244d29e743535c1111 (1).json and here is the file with an additional home added: config_entry-nest-3ab878040e721c244d29e743535c1111 (2).json Note that turning on debug logging doesn't help since I cannot get a successful restart (to capture the log of diagnostic files) when there is more than one entry (home).

  4. Does not restart in safe mode. Here is the home-assistant.log file. Restart froze at the androidtv integration this time. As I mentioned, this is a bit random. home-assistant.log

allenporter commented 8 months ago

I just tested home assistant with two Nest entries and everything seems fine to me.

I think the part i'm not understanding may be you're talking specifically about restart. Does starting work fine in all these scenarios?

At the bottom of https://www.home-assistant.io/integrations/nest/#troubleshooting there are instructions for updating configuration.yaml to enable debug logging for nest.

Having the debug logging from before things break is fine, but not enough so I think we need to capture logs from one of these scenarios you're saying is failing to restart. Ideally this would be in safe mode to rule out other integrations. The last log from safe mode doesn't seem to indicate anything at all is happening with nest...

mjpsr11 commented 8 months ago

The integration works fine with multiple homes (I have 3). Each home uses it's own, unique Google account, email, project, credentials, etc. Each has it's own application credentials saved on the integrations page (kebab menu at the top right of the page). Is this how you tested with 2 entries? Did you try to restart after adding the additional entry? Do you also have Google Assistant integration installed?

The issue only occurs when restarting home assistant (e.g. core restart from command line). During restart after the additional homes are added, the GUI freezes and pages like dashboards, etc. are not accessible (i.e. many of the integrations fail to install). Home assistant is, however, accessible thru an SSL (putty) terminal which is how I am able to eventually restart without a complete re-install of home assistant.

I'm pretty sure that the .1.log file I attached had the nest integration debug on (Item 2 above).

Have you tried to simply restart with 2 separate homes (w/different application credentials) from the [developer tools][yaml] page?

After the GUI freezes, the only way I can get Home Assistant to restart is to delete the (2) additional homes added with the integration from the _core.configentries file, then use SSL (putty) command line to supervisor repair, supervisor restart, then core restart. Since this restart works fine with no other modifications, it must be this integration or a conflict between this integration and something else. Note again that this successful restart has 1 entry (home) added to the integration, so the integration doesn't seem to be conflicting with other integrations, etc. in this scenario.

Once I take this action (restart with only 1 entry in the Nest integration), I can then add my additional 2 homes again (and rename entities for my dashboards), but this is a lot of work every time I have to restart.

allenporter commented 8 months ago

Yes, i tested with two different homes, with two different application credentials, etc.

Whats the log look like when starting home assistant with two integrations with debug enabled? (I don't think that is listed here yet as the only log didn't show anything about nest)

mjpsr11 commented 8 months ago

(I think we covered all this before, but we'll try one more time.)

Whats the log look like when starting home assistant with two integrations with debug enabled?

I think you mean 1) two integrations = 2 entries in the Nest integration, 2) debug enabled = Nest integration debug enabled + debug logging in configuration.yaml file.

If you want the Nest Entry diagnostic file, as I mentioned, I cannot access the page below when I have more than one entry after attempting to restart. The GUI is frozen at this point with a busy icon.

Screenshot 2024-02-28 090027

The above snapshot is just before I attempted another restart (with 2 entries) to meet your request. You can see, that I've enabled integration debugging. Also, below you can see that I've added debugging to configuration.yaml.

Screenshot 2024-02-28 090653

Restarting home assistant in the state above (2 homes and debugging per your request) results in a home-assistant.log file below: home-assistant.log

Nothing new here, it's what I've been explaining all along (i.e. the log is not helpful and diagnostics is unavailable).

I think at this point its clear the we will not get any useful information from logs or debugging the traditional way. It looks like the only way to debug is to go thru the code or validate attempts to combine multiple authentication with google assistant.

It's very time-consuming to have a failed restart, manually modify core.config_entries file to only 1 home, restart home assistant, add the 2 additional integration entries, rename all entities to match dashboard. This is required for each test we attempt.

allenporter commented 8 months ago

Yes, I appreciate that this is very painful to diagnose -- the debug logs and diagnostics are usually the fastest ways to get information to make it easy for end users to report data to developers here, but the problem is that these are failing us. Nest is not appearing anywhere in the logs which i've never run into before. Thanks for your patience whiel we figure this out. (Also, i wasn't able to reproduce, and we haven't heard of another report of this, so its pretty rare/odd!)

Perhaps it is time to use py-spy to find out what is going on? This is much more low level... but we can capture a profile of where things are "stuck" and that may give us insight:

https://community.home-assistant.io/t/how-to-install-py-spy-on-a-ha-os-instance-please-instruct/430822/44

Given that the nest integration isn't even loaded all, now my guess is that its getting stuck when installing package dependencies. We could see if thats doing anything by turning up debug logging here:

logger:
    default: info
    logs:
        homeassistant.util.package: debug
        homeassistant.loader: debug
mjpsr11 commented 8 months ago

I installed py-spy, but getting

Permission denied (os error 13)

when running py-spy record --pid <pid>. Looks like maybe this is a docker protected mode issue with HAOS. This is getting a bit above me as I have no experience with docker.

I added the additional debug to logger since I had some shutdown this morning and had to restart anyway (ugghh). Here is the resulting log file: home-assistant.log

allenporter commented 8 months ago

Perhaps these instructions work? https://community.home-assistant.io/t/instructions-to-install-py-spy-on-haos/480473

I don't see nest mentioned that its even loaded. Pretty odd.

mjpsr11 commented 8 months ago

I could not get this method to work.

mjpsr11 commented 7 months ago

I was able to install pyspy, but get this error: Unsupported version of Python: 3.12.0

It appears the latest version of pyspy (3.14) only supports up to python v3.11. Home assistant v2024.3.2 apparently updates python to v3.12.

Any suggestions? (I'm concerned downgrading python will require home assistant rebuild)

allenporter commented 7 months ago

Definitely don't downgrade python.

How about we try a newer version of home assistant and see if the issue is still persisting for you?

mjpsr11 commented 7 months ago

Running all latest versions. Still cannot restart with more than one nest home. See attached. ha_versions.txt

allenporter commented 7 months ago

I just noticed the default logging level in configuration.yaml is set to WARNING -- can you make it INFO? I think that explains why i've been really confused this whole time and unable to see anything about nest.

mjpsr11 commented 7 months ago

Actually, they were all set to debug as requested (https://github.com/home-assistant/core/issues/110735#issuecomment-1969260815), but I changed them all to info anyway and tried again. Here are the results: home-assistant.log

image

The only thing I did to return to normal restart is to delete these 2 nest entries from core.config_entries: image

allenporter commented 7 months ago

I'm talking about the very first line is WARN. Can you make it info? Having that as warm strips out all the default info logging.

mjpsr11 commented 6 months ago

Got it.

Log with only 1 home, success restart: home-assistant (1 home).log

Log after adding 2 additional homes (without restart attempt): home-assistant (after 2 more homes - no restart).log

Log after adding 2 additional homes, failed restart: home-assistant (after 2 more homes - failed restart).log

Log after removing 2 additional homes, success restart: home-assistant (removed 2 added homes - success restart).log

image

allenporter commented 6 months ago

Ok, doesn't seem anything useful there. Next suggestions could be:

The new debug mode will find threading problems:

homeassistant:
  debug: true

If that doesn't find anything, then next step can be to enable debug logging for Nest and see what we see. (convert those info to debug for the nest packages)

mjpsr11 commented 6 months ago

I'm getting homeassistant: debug: true is an invalid option for configuration.yaml. Any ideas?

image

image

allenporter commented 6 months ago

Ah sorry it's for the new home assistant version just released

mjpsr11 commented 6 months ago

Okay, I upgraded to latest 2024.5.1 and added homeassistant: debug: true

Here is the log file after. Of course, it still fails to fully restart.

home-assistant.log

allenporter commented 6 months ago

That's the log of it failing to start?

Can you scrap all of the custom log settings for nest? There are no error messages or log messages showing anything going wrong. You can then enable debug logging and we'll get actual debug logs.

allenporter commented 6 months ago

Also would be useful to capture this in Safe Mode https://www.home-assistant.io/docs/troubleshooting_general/#restarting-home-assistant-in-safe-mode to avoid any other custom components (many of which have warnings in the logs)

mjpsr11 commented 6 months ago

Well, I can't explain it, but today when I tested, restartis worked with no problems (tested it at least 5 times in a row, no errors). I'm still using 2024.5.1 and haven't updated/upgraded any integrations since the last attempt from my previous post. The only changes to my installation is upgrading my OS to Windows 11 (HA running in VirtualBox).

I will keep you posted if it re-surfaces.

mjpsr11 commented 5 months ago

No changes to ha - only Win11 update, but restart now failing again.

mjpsr11 commented 5 months ago

The problem has returned.

I found a work-around (sort of) that helps to recover easier, but its still not automated.

I created a python script that modifies core.config_entries that leaves the additional integration entries intact, but disables them (by changing the "disabled_by" value). Then ha core restart is successful.

Additionally, the integration is automatically reloaded after a few seconds from successful restart and re-enables the entries without needing to re-add them (nice!).

I just don't know how (if possible) to automate the python script for each startup (i.e. before the ha container is restarted). This would help automate the work-around, but of course not solve the root cause of the problem.

So, it is again down to just the additional integration entries causing the problem only during startup and not the configuration of the entries, etc. Still seems like it could be a Google Cloud authentication problem (maybe a conflicting request since another project has been authenticated?). Does the integration request a new access token during startup?

allenporter commented 5 months ago

The next home assistant release will contain some attempts to simplify how the pubsub subscriber works in case that somehow is causing an issue here.

As requested about, it would be helpful to have debug logging enabled.

Can you run the two scenarios you describe? (one where you have two entries and it works, and the other where you have two entries and it doesn't work) with the following in configuration.yaml? This will capture all communication with the server including anything related to authentication:

homeassistant:
  debug: true

logger:
  default: info
  logs:
    homeassistant.components.nest: debug
    homeassistant.helpers.config_entry_flow: debug
    homeassistant.helpers.config_entry_oauth2_flow: debug
    google_nest_sdm: debug
    google_nest_sdm.device: debug
    google_nest_sdm.device_manager: debug
    google_nest_sdm.google_nest_subscriber: debug

Then we can see the differences of what is happening in the subscriber libraries.

mjpsr11 commented 5 months ago

configuration.yaml modified as above...

Here is the log with two entries and it works (entries have "disabled_by": "user" in core.config_entries): home-assistant.log

and here it is with two entries and it doesn't work (entries have "disabled_by": null in core.config_entries): home-assistant.log

I did this twice in a row and "google_nest" never shows up in the log when it doesn't work.

mjpsr11 commented 5 months ago

I was able to complete the workaround using plink (putty) line command in a powershell script that checks if HA is running or not. The script modifies core.config_entries to disable the additional Nest entries if its not running, then restarts HA. A node red flow then re-enables the additional Nest entries. The script runs every 5 mins thru a scheduled task. Now I can finally have HA up-time when I'm remote if an automatic restart occurs.

allenporter commented 5 months ago

@mjpsr11 i would be curious to know if this happens in the beta release if you are willing to try it given there are a couple fixes in. I may also add more debug logging at the very first step of nest.

mjpsr11 commented 5 months ago

Sure, just let me know what to do.

allenporter commented 5 months ago

The release for 2024.06 is happening today so we can try again on that version when it lands.

mjpsr11 commented 4 months ago

Been using 2024.6.3 for some time. No change. Luckily, my work-around is working transparently and reliably.

allenporter commented 4 months ago

Thank you for reporting back. 2024.6.2 includes additional debug logging statements to help understand what is going on. If you would be able to share the debug logs for the breaking scenario again that could give me some additional clues.

The last version changed the way thread pools are involved and then added logging before and after some threading calls so we may be able to further narrow down where the issue is. I'm currently thinkings its in the google subscriber library, but don't really have any clues to indicate that yet.

issue-triage-workflows[bot] commented 1 month ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.