Open Baughn opened 5 years ago
I think this is basically a duplicate of the closed (FreeBSD) issue #1287.
Having just been bitten by this bug, the problem is a bit more subtle than implied by the initial description: The node_filesystem_avail_bytes
and node_filesystem_size_bytes
are being correctly calculated by the node_exporter code but are using stale (cached) values from the kernel.
In more detail, the unix.Getfsstat
call specifies MNT_NOWAIT and getfsstat(2)
states:
Normally mode should be specified as MNT_WAIT. If mode is set to MNT_NOWAIT, getfsstat() will return the information it has available without requesting an update from each file system.
And, having studied changes in node_filesystem_avail_bytes
over time, as well as rummaging around in the FreeBSD kernel sources, it seems that the cached data is basically never updated under normal operations. This means that, unless something else (like df
) invokes getfsstat
with MNT_WAIT
or statfs(2)
, the reported data will reflect the information from when the filesystem was created or mounted - rendering it useless for Prometheus alerting.
As for fixing the issue:
MNT_WAIT
instead of MNT_NOWAIT
but this runs the risk of blocking indefinitely if (e.g.) a NFS server becomes non-responsive.MNT_NOWAIT
but explicitly call statfs(2)
on each non-NFS (or other "unsafe") filesystem.The most obvious "fix" is to use MNT_WAIT instead of MNT_NOWAIT but this runs the risk of blocking indefinitely if (e.g.) a NFS server becomes non-responsive.
Feature parity with linux :). I'd say we go with this and consider using the stale mount handling implemented in #997 for linux
Now I'm confused though. @Baughn seems to run into this on linux, right? Or is there a similar bug in both?
Host operating system: output of
uname -a
Linux backup-target.atelieraphelion.com 5.0.0-29-generic #31~18.04.1-Ubuntu SMP Thu Sep 12 18:29:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of
node_exporter --version
/nix/store/svm3ypaq5dyznfxr7lhqvk6ymyy0cs9n-node_exporter-0.17.0-bin/bin/node_exporter --version node_exporter, version (branch: , revision: ) build user:
build date:
go version: go1.12.7
node_exporter command line flags
/nix/store/svm3ypaq5dyznfxr7lhqvk6ymyy0cs9n-node_exporter-0.17.0-bin/bin/node_exporter --web.listen-address 0.0.0.0:9100
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
Created a disk-space alert using
node_filesystem_avail_bytes{fstype=~"ext4|zfs|xfs"} / node_filesystem_size_bytes < 0.1
What did you expect to see?
The alert should fire if available space is below 10%, as _avail_bytes should be <10% of size_bytes.
What did you see instead?
The alert never fires for ZFS filesystems, because _avail_bytes and _size_bytes are always equal.