prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
11k stars 2.34k forks source link

node_filesystem collector unable deduplicate data from multihomed mounts like nfs #2514

Open zeronewb opened 1 year ago

zeronewb commented 1 year ago

Host operating system: output of uname -a

3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.4.0 (branch: HEAD, revision: 7da1321761b3b8dfc9e496e1a60e6a476fec6018)

node_exporter command line flags

/usr/sbin/node_exporter --collector.textfile.directory /var/lib/node_exporter/textfile_collector --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$

node_exporter log output

Oct 21 19:34:07 node_exporter: level=error ts=2022-10-21T23:34:07.079Z caller=stdlib.go:105 msg="error gathering metrics: 7 error(s) occurred:\n [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge: } was collected before with the same name and label values\n [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge:<value:5.36870912e+11 > } was collected before with the same name and label values\n [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge:<value:5.2514783232e+11 > } was collected before with the same name and label values\n [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge:<value:5.2514783232e+11 > } was collected before with the same name and label values\n [from Gatherer #2] collected metric \"node_filesystem_files\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge:<value:1.048576e+09 > } was collected before with the same name and label values\n [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge:<value:1.048568733e+09 > } was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:<name:\"device\" value:\"storagesystem:/exports/home\" > label:<name:\"fstype\" value:\"nfs\" > label:<name:\"mountpoint\" value:\"/home\" > gauge: } was collected before with the same name and label values"

Are you running node_exporter in Docker?

No

What did you do that produced an error?

SAN has multiple A records for load balancing. /proc/self/mounts storagesystem:/exports/home /home nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1.1.1.1,mountvers=3,mountport=300,mountproto=tcp,local_lock=all,addr=1.1.1.1 0 0 storagesystem:/exports/home /home nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1.1.1.2,mountvers=3,mountport=300,mountproto=tcp,local_lock=all,addr=1.1.1.2 0 0

What did you expect to see?

Deduplicate data from /proc based on local mounts, and iterate correctly.

What did you see instead?

Log errors above.

discordianfish commented 1 year ago

Ugh yeah that's annoying. Also feels like this might be a regression. I think we had an issue (and fix) for this before? @SuperQ?

rexagod commented 6 months ago

Not sure if there's a fix for this, but I can take this up if there isn't. :)