home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.55k stars 30.73k forks source link

Glances integration loses connection and doesn't automatically recover #110551

Open spikeygg opened 8 months ago

spikeygg commented 8 months ago

The problem

I have glances on a few machines in my house. There are one or two of them exhibiting this problem where the integration just (seemingly randomly) stops receiving the data. You can see in this plot that it happened this morning around midnight on the Plexbox: image it recorded the last datapoint at 12:51:03 and then static.

I've been watching this happen for a few months. The solution: I go into the integration and just hit the 'reload' on this machine and it starts working again for several days until it happens again.

I checked the logs for detail around this period and there are no entries for Glances.

What version of Home Assistant Core has the issue?

core-2024.2.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Glances

Link to integration documentation on our website

https://www.home-assistant.io/integrations/glances

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 8 months ago

Hey there @engrbm87, mind taking a look at this issue as it has been labeled with an integration (glances) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `glances` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign glances` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


glances documentation glances source (message by IssueLinks)

ynazar1 commented 8 months ago

Same issue here. I've got a server that for whatever reason do a network disconnect/reconnect if it's not in use so the graphs end up looking something like this: image

My current workaround is to reload the glances integration for the server and it starts reading again, but it's manual process.

engrbm87 commented 8 months ago

Please enable debugging and share the logs after the connection is broken.

spikeygg commented 8 months ago

Please enable debugging and share the logs after the connection is broken.

home-assistant_glances_2024-02-19T15-16-28.138Z.log

Here's what I did: 1) Turn on debug logging for glances component 2) Added the system voron.mynet with glances 3) Waited about 5 minutes, watched the connection die 4) Waited a few more minutes and then turned off debug logging - logfile above.

fantnhu commented 8 months ago

Hello! I also have a problem with integration. The addon works perfectly, but the integration sensors often become unknown (the addon works then too, no problem)

`Logger: glances_api Source: components/glances/init.py:76 First occurred: 2024. február 20. 14:51:05 (1 occurrences) Last logged: 2024. február 20. 14:51:05

Glances api older than v3 will not be supported in the next release.`

I tried to set up integration based on localhost and ip. Both stop at the same time.

`Logger: homeassistant.components.glances.coordinator Source: helpers/update_coordinator.py:345 Integration: Glances (documentation, issues) First occurred: 2024. február 20. 14:43:48 (11 occurrences) Last logged: 07:24:14

Error fetching glances - localhost data: Error fetching glances - 127.0.0.1 data:`

No error message in the addon log.

Core 2024.2.2 Supervisor 2024.02.0 Operating System 11.5

Any ideas? (could it be a time-out?)

fantnhu commented 8 months ago

Update: When I use the "reload configuration" button in the integration, it works, but for a vague period of time. Reloading solves it so that not all entities are unknown.

ynazar1 commented 8 months ago

As a temporary work-around add this to your crontab on the remote server that loses connection (the one that is being polled from homeassistant):

# Ping google every 5 mins to keep tcp alive
*/5 *    * * *  ping -c 3 8.8.8.8 &> /dev/null

I'm sure there's probably a smarter sysctl tweak... but the above seems to work. This resolves remote server disconnecting, but not transitory network issue between servers or hass.

Ideally, glances integration should add an optional, 'reload config if sensors are stale' setting.

spikeygg commented 6 months ago

Guys, I think it's related to the host glances service adding new things to its list of reported details. Read the update I made on the other thread, these errors seem to be directly related in my case. You can do some sleuthing on your ends to see if you can find similar triggers.

One thing I noticed the last 4-5 times I restarted the service from HA is that the number of entities always seems to rise a little after I restart the service. I think it's because the host service created new things to track and that crashed the HA glances service. When you restart the HA glances service it reestablishes the connection and creates the new entities in HA and everything is happy again.

ynazar1 commented 5 months ago

What @spikeygg said seems to be correct. I just restarted mine that wasn't reporting and number of entities it keeps track of changed. It still begs the question of how to do this programmatically from HA

Trying to do this: https://community.home-assistant.io/t/can-i-write-an-automation-to-reload-restart-an-integration/301020/17

spikeygg commented 5 months ago

To anyone who may care: I think I've been able to work around this issue by making the monitored resources on the host system more predictable. In other words defining my own Glances configuration on the host system instead of leaving it up to the defaults. With that, I've shut off many (if not all) of the docker-based filesystems so when they change nothing gets updated on the Glances results.

I've been running like this for over a week and haven't lost the connection yet when previously, it was two or three days and the system would go unavailable in Home Assistant.

ynazar1 commented 5 months ago

@spikeygg I think you're totally right, it's gotta be the docker volumes because my docker system that's set to auto update is the one that somewhat regularly disconnects and the other one is more static so it doesn't.

Care to give a hint on how to config glances with docker volume exclusions?

spikeygg commented 5 months ago

@spikeygg I think you're totally right, it's gotta be the docker volumes because my docker system that's set to auto update is the one that somewhat regularly disconnects and the other one is more static so it doesn't.

Care to give a hint on how to config glances with docker volume exclusions?

Sure thing @ynazar1,

First, I created a glances.conf file like this:

[diskio]
show=sd.*,sr.*

[fs]
hide=.*/boot.*,.*/snap.*,/dev/loop.*,.*nvidia.*,.*/glances.conf,.*/resolv.conf,.*/hostname,.*/hosts,/usr/lib.*

then pointed my docker container to include it and use it by adding and updating the docker command line with two these options: -v /the_path_to_the_file/my-local-glances.conf:/my-container-glances.conf <-- map the conf file to the container -e GLANCES_OPT="-w -C /my-container-glances.conf" <-- tell glances to look at the new conf file

For me it cut the file system monitoring down to just what I actually wanted. I kept adjusting the string and recreating the docker container until the Glances webpath (at port 61208) showed the filesystem list that I wanted. I used to have like a dozen file systems listed with the same values but now I only have six and they're the specific few ones I want to monitor anyway.

ynazar1 commented 5 months ago

Fantastic. I filtered out all the obvious docker things (since I'm running glances on root os). Let's see if that helps. Leaving it here if anyone else needs it:

glances.conf

[fs]
hide=/snap.*,

[diskio]
hide=loop\d+

[network]
hide=br-.*,docker\d+,veth.*
IIIdefconIII commented 5 months ago

for some reason mine doesnt conenct at all anymore...

http://10.3.10.10:61208/

Does work locally without a pass in a docker container on a syno nas.

IIIdefconIII commented 5 months ago

all m getting is:

image

Which worked before

image

Im basicly on evertyhing on :latest

IIIdefconIII commented 5 months ago

image

ynazar1 commented 5 months ago

@IIIdefconIII You've got entirely different issue. Why are you using localhost as host string for remote server?

IIIdefconIII commented 5 months ago

Ah ok sry it tonight I may be tested and no I'm using the local adres where it's located

wittypluck commented 5 months ago

This PR could help, it adds checks to avoid crashes in the HA integration when a key goes missing from Glances server side:

114628

zollak commented 5 months ago

Hi, I have almost the same issue except the integration reload not solved it.

Screenshot 2024-05-22 at 10 55 19

I have found this issue in the glances repo as well: https://github.com/nicolargo/glances/issues/2453

According to the v4 issue (glances v4 is not running with the current HA integration), I have downgraded my glances to 3.4.0.5. I'm using it on proxmox server. Started glances through systemd in web server mode. Here is my config, that has the connection issue (the original config came from the glances' wiki, that I've modified to using it with less plugin):

[Unit]
Description=Glances
After=network.target

[Service]
ExecStart=/usr/local/bin/glances -w -u glances --disable-plugin all --enable-plugin cpu,mem,load,gpu,sensors,uptime
Restart=on-abort
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Based on the man page of systemd.service, the RemainAfterExit= said that: "Takes a boolean value that specifies whether the service shall be considered active even when all its processes exited. Defaults to no."

This systemd settings solved the connection issue for me:

[Unit]
Description=Glances
After=network.target

[Service]
ExecStart=/usr/local/bin/glances -w -u glances --disable-plugin all --enable-plugin cpu,mem,load,gpu,sensors,uptime
Restart=always
RemainAfterExit=no

[Install]
WantedBy=multi-user.target

I think, it should be update on the wiki page as well.

ynazar1 commented 5 months ago

Cycling back to this. Excluding changing hardware with glances.conf seems to support group findings that glances breaks when keys change on the server. I've not had the data-stop issue happen for a while now after the config changes. Before it was happening every other week or so with regularity.

Once the PR https://github.com/home-assistant/core/pull/114628 is accepted I'm hoping this will stop being an issue.

wittypluck commented 2 months ago

Hello @spikeygg @ynazar1 , this issue may be solved by #114628 in version 2024.8.2 If you can confirm this we can close this issue. Thanks!