Closed x652001 closed 4 years ago
Are you sure you're running 3.10.0-123
? This is the original CentOS 7 kernel from 2014.
This feels like a kernel bug.
Assuming this is a kernel bug and closing since there was no response
I'm seeing this as well with a current kernel and node-exporter v0.18.1. It's also happening with /home
on my system, and it seems to be getting a value from a tmpfs as in the original report.
$ uname -a
Linux server 5.3.7-arch1-1-ARCH #1 SMP PREEMPT Fri Oct 18 00:17:03 UTC 2019 x86_64 GNU/Linux
Data from node-exporter:
node_filesystem_avail_bytes{device="run",fstype="tmpfs",mountpoint="/run"} 8.388939776e+09
node_filesystem_device_error{device="run",fstype="tmpfs",mountpoint="/run"} 0
node_filesystem_files{device="run",fstype="tmpfs",mountpoint="/run"} 2.048422e+06
node_filesystem_files_free{device="run",fstype="tmpfs",mountpoint="/run"} 2.047586e+06
node_filesystem_free_bytes{device="run",fstype="tmpfs",mountpoint="/run"} 8.388939776e+09
node_filesystem_readonly{device="run",fstype="tmpfs",mountpoint="/run"} 0
node_filesystem_size_bytes{device="run",fstype="tmpfs",mountpoint="/run"} 8.390336512e+09
node_filesystem_avail_bytes{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 8.388939776e+09
node_filesystem_device_error{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 0
node_filesystem_files{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 2.048422e+06
node_filesystem_files_free{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 2.047586e+06
node_filesystem_free_bytes{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 8.388939776e+09
node_filesystem_readonly{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 0
node_filesystem_size_bytes{device="/dev/nvme0n1p3",fstype="ext4",mountpoint="/home"} 8.390336512e+09
Output from df -h
:
run 7.9G 1.4M 7.9G 1% /run
/dev/nvme0n1p3 147G 98M 140G 1% /home
Just checked with master and it seems to be working correctly. I didn't try to track down the fix, but it looks like it will probably be good with the next release.
@justinfenn Thanks for confirming!
Sorry to bump an old issue, but I think I know what happened, and maybe it will be useful to someone else who encounters this issue. I was using node_exporter
from the Arch package, and that was setting ProtectHome=yes
in the unit file. This bug report sounds like basically the same issue as this one, and it was recently fixed.
In my case, when I ran from the master branch to test, I just started node_exporter
directly and didn't run it as a service, so I avoided the issue and saw the correct sizes. I just assumed that there had been some code change that fixed the issue, but it was actually a configuration problem the whole time.
Update: Didn't check that I run the latest release. I don't, will update and check again.
I'm seeing this issue with node_exporter, version 1.5.0 (branch: HEAD, revision: 1b48970ffcf5630534fb00bb0687d73c66d1c959), also running with ProtectHome=yes
(and ProtectSystem=full
). It seems to get the value for e.g. /home
from ... the tmpfs.
I'll investigate further and have a look at the code.
It's fixed by ProtectHome=read-only
- updating the Ansible Galaxy role takes care of that, if you use it. I didn't check if node_exporter 1.6.1 fixed a possible regression.
Why I would call it a bug if it still exists: with ProtectHome=yes
, node_exporter can't access /home
. This is expected. It should however not give a value in this case. To understand the problem better I checked with df
in ProtectHome=yes
service: It does not list /home
at all - correctly! - in e.g. df -h
; when asked directly for the mount, e.g. df -h /home
, it also reports the same incorrect value instead of an error (might be how it is intended/setup.)
Anyway just reporting back in case someone else encounters this.
@mikegerber It would be interesting to see the difference of why df -h
is not listing filesystems with ProtectHome=yes
.
@mikegerber It would be interesting to see the difference of why
df -h
is not listing filesystems withProtectHome=yes
.
It does list filesystems, but not the inaccesible /home
. I think this is the best behavior in this configuration.
I put some commands in sh -x
here, should illustrate well:
❯ sudo journalctl -u test-df-protecthome.service | cat
Nov 03 13:53:14 leguin systemd[1]: Starting test-df-protecthome.service - Test df vs. ProtectHome and ProtectSystem...
Nov 03 13:53:14 leguin sh[1089098]: + /usr/bin/df -h
Nov 03 13:53:14 leguin sh[1089098]: Filesystem Size Used Avail Use% Mounted on
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-root 45G 37G 5.2G 88% /
Nov 03 13:53:14 leguin sh[1089098]: tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
Nov 03 13:53:14 leguin sh[1089098]: efivarfs 154K 39K 111K 26% /sys/firmware/efi/efivars
Nov 03 13:53:14 leguin sh[1089098]: devtmpfs 4.0M 0 4.0M 0% /dev
Nov 03 13:53:14 leguin sh[1089098]: tmpfs 7.8G 0 7.8G 0% /dev/shm
Nov 03 13:53:14 leguin sh[1089098]: tmpfs 3.1G 2.0M 3.1G 1% /run
Nov 03 13:53:14 leguin sh[1089098]: tmpfs 7.8G 3.7M 7.8G 1% /tmp
Nov 03 13:53:14 leguin sh[1089098]: /dev/nvme0n1p3 474M 264M 182M 60% /boot
Nov 03 13:53:14 leguin sh[1089098]: /dev/nvme0n1p1 256M 20M 237M 8% /boot/efi
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-halde--tmp 50G 9.2G 40G 19% /halde-tmp
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-srv_backup--archiv 49G 1.5G 45G 4% /srv/backup-archiv
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-var_lib_docker 59G 11G 46G 20% /var/lib/docker
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-var_lib_flatpak 20G 5.8G 13G 32% /var/lib/flatpak
Nov 03 13:53:14 leguin sh[1089098]: /dev/mapper/vg_leguin-var_lib_libvirt_images 99G 44G 51G 47% /var/lib/libvirt/images
Nov 03 13:53:14 leguin sh[1089099]: + ls -d /home
Nov 03 13:53:14 leguin sh[1089100]: /home
Nov 03 13:53:14 leguin sh[1089099]: + ls /home
Nov 03 13:53:14 leguin sh[1089101]: ls: cannot open directory '/home': Permission denied
Nov 03 13:53:14 leguin sh[1089099]: + df -h /home
Nov 03 13:53:14 leguin sh[1089099]: Filesystem Size Used Avail Use% Mounted on
Nov 03 13:53:14 leguin sh[1089099]: tmpfs 3.1G 2.0M 3.1G 1% /home
Nov 03 13:53:14 leguin systemd[1]: test-df-protecthome.service: Deactivated successfully.
Nov 03 13:53:14 leguin systemd[1]: Finished test-df-protecthome.service - Test df vs. ProtectHome and ProtectSystem.
(df
is /usr/bin/df
, wasn't being very consistent with this test. But I checked so I don't confuse things.)
I'll check with node_exporter 1.6.1 again next week (I have a vacation day today 😃), altough my immediate problem is solved by the Ansible Galaxy role now configuring ProtectedHome=read-only
(which is is correct in my case - needed specifically correct values for /home
)
With node_exporter 1.6.1 running with ProtectHome=yes
:
% curl -s http://localhost:9100/metrics | egrep '^node_exporter_build_info|^node_filesystem_avail_bytes.*home'
node_exporter_build_info{branch="HEAD",goarch="amd64",goos="linux",goversion="go1.20.6",revision="4a1b77600c1873a8233f3ffb55afcedbb63b8d84",tags="netgo osusergo static_build",version="1.6.1"} 1
node_filesystem_avail_bytes{device="/dev/mapper/san-data0",fstype="ext4",mountpoint="/home"} 2.64077406208e+11
That value of ~264 GB is the same value the /run
tmpfs on the system reports.
With ProtectHome=read-only
the value is correct as expected (~204 GB):
% curl -s http://localhost:9100/metrics | egrep '^node_exporter_build_info|^node_filesystem_avail_bytes.*home'
node_exporter_build_info{branch="HEAD",goarch="amd64",goos="linux",goversion="go1.20.6",revision="4a1b77600c1873a8233f3ffb55afcedbb63b8d84",tags="netgo osusergo static_build",version="1.6.1"} 1
node_filesystem_avail_bytes{device="/dev/mapper/san-data0",fstype="ext4",mountpoint="/home"} 2.0448495616e+11
(Note: The df output two comments above is from a different system (Fedora 37), while the node_exporter output is from the elderly CentOS system I first encountered the problem on. Don't think it matters, the behavior is the same. )
Host operating system: output of
uname -a
Linux us-cdn 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.7.1908 (Core)
node_exporter version: output of
node_exporter --version
node_exporter, version 0.18.1 (branch: HEAD, revision: 3db77732e925c08f675d7404a8c46466b2ece83e) build user: root@b50852a1acba build date: 20190604-16:41:18 go version: go1.12.5
node_exporter command line flags
/usr/local/bin/node_exporter --collector.systemd --collector.textfile --collector.textfile.directory=/var/lib/node_exporter --web.listen-address=0.0.0.0:9100
Are you running node_exporter in Docker?
No
What did you do that produced an error?
curl localhost:9100/metrics | grep node_filesystem_avail_bytes
df -h
The value of {mountpoint="/home"} and {mountpoint="/run"} are the same but those value are different from
df -h
What did you expect to see?
The value of the
node_filesystem_avail_bytes{device="/dev/mapper/centos-home",fstype="xfs",mountpoint="/home"}
should be as same asdf -h
What did you see instead?
Correct value from
df -h