Open treydock opened 4 years ago
That is odd.. What does df
say about /tmp on these systems?
[root@o0297 ~]# df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg0-lv_tmp 858896636 1176452 857720184 1% /tmp
The metric for total size remained accurate while the avail/free was the one that was higher than total size. These are HPC compute nodes so it's possible this happened when /tmp was full due to some user doing something they shouldn't but hard to say for sure since the monitoring numbers we rely on were incorrect.
Would be useful to get the raw output of statfs from here: https://github.com/prometheus/node_exporter/blob/master/collector/filesystem_linux.go#L78
Do you see any errors in the node-exporter log? Maybe the mountpoint got stuck leading to this miscalculation. But the code is pretty straight forward, so not sure what is going on here. Maybe some float overflow (https://github.com/prometheus/node_exporter/blob/master/collector/filesystem_linux.go#L109) but I doubt that.
I've looked at the code and also can't imagine how this would become a problem as the code is essentially taking values returned by the kernel and doing simple math to get bytes from blocks.
There are no relevant errors in logs. The only logs from node_exporter are from issues generating mountinfo but that's an issue with procfs (https://github.com/prometheus/procfs/pull/282)
Apr 9 03:23:58 o0297 node_exporter: level=error ts=2020-04-09T07:23:58.801Z caller=collector.go:161 msg="collector failed" name=mountstats duration_seconds=0.007737361 err="failed to parse mountinfo: couldn't find enough fields in mount string: 108 53 0:34 / /var/lib/nfs/rpc_pipefs rw,relatime - rpc_pipefs sunrpc rw"
Host operating system: output of
uname -a
node_exporter version: output of
node_exporter --version
node_exporter command line flags
This is NFS root which produces lots of bind mounts so that is why we have a lot of filesystem ignores.
Are you running node_exporter in Docker?
Not via Docker.
What did you do that produced an error?
Look at a graph in Grafana that uses these metrics. The filesystem avail bytes is an extremely large number and much larger than size bytes.
What did you expect to see?
I would never expect avail or free bytes for a filesystem to exceed the size.
What did you see instead?
The orange line is the avail bytes and the green line that appears to be near 0 is size in bytes.
The size in bytes is
879510155264
which is accurate but the avail bytes is so much larger the scale makes size in bytes look near zero.