Closed mjpsr11 closed 1 month ago
Hey there @allenporter, mind taking a look at this issue as it has been labeled with an integration (nest
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
nest documentation nest source (message by IssueLinks)
Hi, are you able to get diagnostics? https://www.home-assistant.io/docs/configuration/troubleshooting/#debug-logs-and-diagnostics also enabling debug logging might be helpful as well. It may be needed to first enable debug logging when integration #1 is added, then add the second one.
(My only thought I can think of is if somehow running two google cloud pubsub subscribers is causing an issue where they accidentally are sharing state or a thread or causing a deadlock on startup of the second integration. the debug logs would show if this is the case)
Log file with debug enabled for Google Nest integration attached as requested. Unfortunately, as mentioned before there are no entries for this integration. I've also attached a snapshot to show where restart progress messages freeze. No dashboards or add-ons are available.
At this point, I have to manually remove the additional integration entries (homes) from the core.config_entries file, restart, then add them back again.
To be more specific I'm also looking for diagnostics as well. Can you attach that?
Hey there @home-assistant/cloud, mind taking a look at this issue as it has been labeled with an integration (google_assistant
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
google_assistant documentation google_assistant source (message by IssueLinks)
Ah from the screenshot "google assistant" is a different integration than the nest integration? So i think its a different integration we're talking about here.
Sorry, not sure what 'diagnostics' means. Is that a file?
The screenshot is just to show that it stops the restart process at some point. Does not necessarily mean that the Google Assistant integration is the root cause (however, they may be related since they both use Google project credentials).
The repeatable solution is to remove/delete the additional homes from the Nest integration (as I stated above) before restarting.
See "download diagnostics" at https://www.home-assistant.io/docs/configuration/troubleshooting/#debug-logs-and-diagnostics (which i posted above in https://github.com/home-assistant/core/issues/110735#issuecomment-1948887680)
Okay, I can download diagnostics, but only after I manually add the integration. When the error occurs, the integration page is not accessible.
This is from one of the additional integration entries that was added manually:
Thanks, that is a good start.
Question (1): How long was this instance up before you collected diagnostics? The statistics say that there were 14k messages received since it started which seems incredibly high, so helpful to have a little context there
Lets complete more aspects of the original question template about logs: Request (2): For the 1 integration, is there anything useful or interesting in the logs? debug logs? Request (3): Can you turn on debug logging, then add the second integration, then see the difference in the logs when adding the second integration?
Lastly, one other question. Request (4): Can you reproduce this behavior when in safe mode?
Appreciate all the effort here, it will go a long way towards being able to solve this problem. Thanks!
Had to restart this morning (not sure what caused it, but GUI was frozen).
Yes, the instance for the diagnostic file just sent was pretty old. I don't know how old since the restart cleared the log file.
Here is the log file with debugging on just before adding the second home: home-assistant.log.1.log (Based on the timesof debugging info in the log file, I'm now assuming that debugging starts as soon as it is selected instead of after a restart. Can you confirm?)
Here is the Nest diagnostic file with only one home. config_entry-nest-3ab878040e721c244d29e743535c1111 (1).json and here is the file with an additional home added: config_entry-nest-3ab878040e721c244d29e743535c1111 (2).json Note that turning on debug logging doesn't help since I cannot get a successful restart (to capture the log of diagnostic files) when there is more than one entry (home).
Does not restart in safe mode. Here is the home-assistant.log file. Restart froze at the androidtv integration this time. As I mentioned, this is a bit random. home-assistant.log
I just tested home assistant with two Nest entries and everything seems fine to me.
I think the part i'm not understanding may be you're talking specifically about restart. Does starting work fine in all these scenarios?
At the bottom of https://www.home-assistant.io/integrations/nest/#troubleshooting there are instructions for updating configuration.yaml
to enable debug logging for nest.
Having the debug logging from before things break is fine, but not enough so I think we need to capture logs from one of these scenarios you're saying is failing to restart. Ideally this would be in safe mode to rule out other integrations. The last log from safe mode doesn't seem to indicate anything at all is happening with nest...
The integration works fine with multiple homes (I have 3). Each home uses it's own, unique Google account, email, project, credentials, etc. Each has it's own application credentials saved on the integrations page (kebab menu at the top right of the page). Is this how you tested with 2 entries? Did you try to restart after adding the additional entry? Do you also have Google Assistant integration installed?
The issue only occurs when restarting home assistant (e.g. core restart
from command line). During restart after the additional homes are added, the GUI freezes and pages like dashboards, etc. are not accessible (i.e. many of the integrations fail to install). Home assistant is, however, accessible thru an SSL (putty) terminal which is how I am able to eventually restart without a complete re-install of home assistant.
I'm pretty sure that the .1.log file I attached had the nest integration debug on (Item 2 above).
Have you tried to simply restart with 2 separate homes (w/different application credentials) from the [developer tools][yaml]
page?
After the GUI freezes, the only way I can get Home Assistant to restart is to delete the (2) additional homes added with the integration from the _core.configentries file, then use SSL (putty) command line to supervisor repair
, supervisor restart
, then core restart
. Since this restart works fine with no other modifications, it must be this integration or a conflict between this integration and something else. Note again that this successful restart has 1 entry (home) added to the integration, so the integration doesn't seem to be conflicting with other integrations, etc. in this scenario.
Once I take this action (restart with only 1 entry in the Nest integration), I can then add my additional 2 homes again (and rename entities for my dashboards), but this is a lot of work every time I have to restart.
Yes, i tested with two different homes, with two different application credentials, etc.
Whats the log look like when starting home assistant with two integrations with debug enabled? (I don't think that is listed here yet as the only log didn't show anything about nest)
(I think we covered all this before, but we'll try one more time.)
Whats the log look like when starting home assistant with two integrations with debug enabled?
I think you mean 1) two integrations = 2 entries in the Nest integration, 2) debug enabled = Nest integration debug enabled + debug logging in configuration.yaml file.
If you want the Nest Entry diagnostic file, as I mentioned, I cannot access the page below when I have more than one entry after attempting to restart. The GUI is frozen at this point with a busy icon.
The above snapshot is just before I attempted another restart (with 2 entries) to meet your request. You can see, that I've enabled integration debugging. Also, below you can see that I've added debugging to configuration.yaml.
Restarting home assistant in the state above (2 homes and debugging per your request) results in a home-assistant.log file below: home-assistant.log
Nothing new here, it's what I've been explaining all along (i.e. the log is not helpful and diagnostics is unavailable).
I think at this point its clear the we will not get any useful information from logs or debugging the traditional way. It looks like the only way to debug is to go thru the code or validate attempts to combine multiple authentication with google assistant.
It's very time-consuming to have a failed restart, manually modify core.config_entries file to only 1 home, restart home assistant, add the 2 additional integration entries, rename all entities to match dashboard. This is required for each test we attempt.
Yes, I appreciate that this is very painful to diagnose -- the debug logs and diagnostics are usually the fastest ways to get information to make it easy for end users to report data to developers here, but the problem is that these are failing us. Nest is not appearing anywhere in the logs which i've never run into before. Thanks for your patience whiel we figure this out. (Also, i wasn't able to reproduce, and we haven't heard of another report of this, so its pretty rare/odd!)
Perhaps it is time to use py-spy to find out what is going on? This is much more low level... but we can capture a profile of where things are "stuck" and that may give us insight:
Given that the nest integration isn't even loaded all, now my guess is that its getting stuck when installing package dependencies. We could see if thats doing anything by turning up debug logging here:
logger:
default: info
logs:
homeassistant.util.package: debug
homeassistant.loader: debug
I installed py-spy, but getting
Permission denied (os error 13)
when running py-spy record --pid <pid>
. Looks like maybe this is a docker protected mode issue with HAOS. This is getting a bit above me as I have no experience with docker.
I added the additional debug to logger since I had some shutdown this morning and had to restart anyway (ugghh). Here is the resulting log file: home-assistant.log
Perhaps these instructions work? https://community.home-assistant.io/t/instructions-to-install-py-spy-on-haos/480473
I don't see nest mentioned that its even loaded. Pretty odd.
I could not get this method to work.
I was able to install pyspy, but get this error: Unsupported version of Python: 3.12.0
It appears the latest version of pyspy (3.14) only supports up to python v3.11. Home assistant v2024.3.2 apparently updates python to v3.12.
Any suggestions? (I'm concerned downgrading python will require home assistant rebuild)
Definitely don't downgrade python.
How about we try a newer version of home assistant and see if the issue is still persisting for you?
Running all latest versions. Still cannot restart with more than one nest home. See attached. ha_versions.txt
I just noticed the default logging level in configuration.yaml
is set to WARNING
-- can you make it INFO
? I think that explains why i've been really confused this whole time and unable to see anything about nest.
Actually, they were all set to debug
as requested (https://github.com/home-assistant/core/issues/110735#issuecomment-1969260815), but I changed them all to info
anyway and tried again. Here are the results:
home-assistant.log
The only thing I did to return to normal restart is to delete these 2 nest entries from core.config_entries
:
I'm talking about the very first line is WARN. Can you make it info? Having that as warm strips out all the default info logging.
Got it.
Log with only 1 home, success restart: home-assistant (1 home).log
Log after adding 2 additional homes (without restart attempt): home-assistant (after 2 more homes - no restart).log
Log after adding 2 additional homes, failed restart: home-assistant (after 2 more homes - failed restart).log
Log after removing 2 additional homes, success restart: home-assistant (removed 2 added homes - success restart).log
Ok, doesn't seem anything useful there. Next suggestions could be:
The new debug mode will find threading problems:
homeassistant:
debug: true
If that doesn't find anything, then next step can be to enable debug logging for Nest and see what we see. (convert those info
to debug
for the nest packages)
I'm getting homeassistant: debug: true
is an invalid option for configuration.yaml. Any ideas?
Ah sorry it's for the new home assistant version just released
Okay, I upgraded to latest 2024.5.1 and added homeassistant: debug: true
Here is the log file after. Of course, it still fails to fully restart.
That's the log of it failing to start?
Can you scrap all of the custom log settings for nest? There are no error messages or log messages showing anything going wrong. You can then enable debug logging and we'll get actual debug logs.
Also would be useful to capture this in Safe Mode https://www.home-assistant.io/docs/troubleshooting_general/#restarting-home-assistant-in-safe-mode to avoid any other custom components (many of which have warnings in the logs)
Well, I can't explain it, but today when I tested, restart
is worked with no problems (tested it at least 5 times in a row, no errors). I'm still using 2024.5.1 and haven't updated/upgraded any integrations since the last attempt from my previous post. The only changes to my installation is upgrading my OS to Windows 11 (HA running in VirtualBox).
I will keep you posted if it re-surfaces.
No changes to ha - only Win11 update, but restart now failing again.
The problem has returned.
I found a work-around (sort of) that helps to recover easier, but its still not automated.
I created a python script that modifies core.config_entries
that leaves the additional integration entries intact, but disables them (by changing the "disabled_by" value). Then ha core restart
is successful.
Additionally, the integration is automatically reloaded after a few seconds from successful restart and re-enables the entries without needing to re-add them (nice!).
I just don't know how (if possible) to automate the python script for each startup (i.e. before the ha container is restarted). This would help automate the work-around, but of course not solve the root cause of the problem.
So, it is again down to just the additional integration entries causing the problem only during startup and not the configuration of the entries, etc. Still seems like it could be a Google Cloud authentication problem (maybe a conflicting request since another project has been authenticated?). Does the integration request a new access token during startup?
The next home assistant release will contain some attempts to simplify how the pubsub subscriber works in case that somehow is causing an issue here.
As requested about, it would be helpful to have debug logging enabled.
Can you run the two scenarios you describe? (one where you have two entries and it works, and the other where you have two entries and it doesn't work) with the following in configuration.yaml
? This will capture all communication with the server including anything related to authentication:
homeassistant:
debug: true
logger:
default: info
logs:
homeassistant.components.nest: debug
homeassistant.helpers.config_entry_flow: debug
homeassistant.helpers.config_entry_oauth2_flow: debug
google_nest_sdm: debug
google_nest_sdm.device: debug
google_nest_sdm.device_manager: debug
google_nest_sdm.google_nest_subscriber: debug
Then we can see the differences of what is happening in the subscriber libraries.
configuration.yaml
modified as above...
Here is the log with two entries and it works (entries have "disabled_by": "user" in core.config_entries
):
home-assistant.log
and here it is with two entries and it doesn't work (entries have "disabled_by": null in core.config_entries
):
home-assistant.log
I did this twice in a row and "google_nest" never shows up in the log when it doesn't work.
I was able to complete the workaround using plink (putty) line command in a powershell script that checks if HA is running or not. The script modifies core.config_entries
to disable the additional Nest entries if its not running, then restarts HA. A node red flow then re-enables the additional Nest entries. The script runs every 5 mins thru a scheduled task. Now I can finally have HA up-time when I'm remote if an automatic restart occurs.
@mjpsr11 i would be curious to know if this happens in the beta release if you are willing to try it given there are a couple fixes in. I may also add more debug logging at the very first step of nest.
Sure, just let me know what to do.
The release for 2024.06 is happening today so we can try again on that version when it lands.
Been using 2024.6.3 for some time. No change. Luckily, my work-around is working transparently and reliably.
Thank you for reporting back. 2024.6.2 includes additional debug logging statements to help understand what is going on. If you would be able to share the debug logs for the breaking scenario again that could give me some additional clues.
The last version changed the way thread pools are involved and then added logging before and after some threading calls so we may be able to further narrow down where the issue is. I'm currently thinkings its in the google subscriber library, but don't really have any clues to indicate that yet.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
The problem
Core restart fails when Google Nest integration has more than one entry and each entry uses a separate Google account and credentials.
What version of Home Assistant Core has the issue?
core-2024.2.1
What was the last working version of Home Assistant Core?
core-2024.2.1
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Google Nest
Link to integration documentation on our website
https://www.home-assistant.io/integrations/nest/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
Log files don't provide any clues of failure. The restart issue is repeatable when additional entries with separate Google accounts are used. When the additional entries are deleted (or manually deleted from core.config_entries), restart is successful.
The final GUI restart message when failed (GUI message freezes, but supervisor/core is still active) is mostly random, but seems to land on loading Google Assistant integration more often than not. The result is any integration that has not yet loaded is never loaded, as if restart is stuck in a loop or code execution has ended.