Open keyolk opened 7 years ago
Can you attach a copy of /proc/mounts
? This is where the exporter gets the filesystem list.
@SuperQ
/proc/mounts here
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=24708776k,nr_inodes=6177194,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sda1 / ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home1 ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /cgroup/pids cgroup rw,relatime,pids 0 0
To me that looks like the mount options would be no help in this case. There is no way to tell the difference between {device="/dev/sda3",mountpoint="/home1"}
and {device="/dev/sda3",mountpoint="/home"}
@SuperQ Actually it is mounted like below
$ cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Tue Jun 21 16:50:34 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=e4ebf103-b5b9-4620-a532-ccc7205f9eb2 / ext4 defaults,noatime,nodev,nobarrier 1 1
UUID=f860799f-1af0-4e16-ac4f-42a07cac8173 /home1 ext4 defaults,noatime,nodev,nobarrier 1 2
UUID=b76e0523-3bae-412d-a06c-1ad53572aba4 swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/home1 /home none default,bind 0 0
in terms of monitoring storage, distinguising those two points are somewhat not good to me : (
The node_exporter
does not read from /etc/fstab
as it is not the authoritative source of information about what is mounted. Many systems use automatic mount management, hence the only source of what is mounted comes from /proc/mounts
generated by the kernel.
Duplicate bind mounts are indistinguishable from the kernel's perspective, similar to a hard link.
There are two options:
/etc/fstab
and expose them with the textfile interface.There is a better source of information than /proc/mounts: /proc/self/mountinfo. That has added data as to what subdirectory from the device is mounted at the destination. For a bind mount of /data/shared/www into /var/www/shared, it looks like this:
34 21 253:4 / /data rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota
37 21 253:4 /shared/www /var/www/shared rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota
Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar). Then downstream rules can just choose to ignore any timeseries that don't have mountroot="/".
This won't help OP since they're bind-mounting the root of the filesystem (which truly is indistinguishable), but it will help those of us who bind-mount subtrees, which is very common (and having many random subtrees mounted is more common than having the root mounted many times).
Note that symlinks are usually an option for bind-mounting the root, but not for subtrees: one of the nice things about bind-mounting subtrees is that lets you bypass permissions checking for the parent directories at the source, which enables some interesting use cases that symlinks cannot provide.
@marcan That's a good idea. I think it's something we can implement.
Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar).
I think we should be dropping such filesystems, as we already have the usage information from the actual filesystem mount. I'm not sure it's a good idea to add another label onto a key metric which already has more labels than it technically needs.
@brian-brazil I agree, we don't need them in the use metrics. We could include the bind mounting as a separate node_filesystem_mount_info
mapping.
The tricky bit is that it's possible to unmount the bare-root filesystem and leave the bind mount. At that point you'd have to implement deduplication in the mount list to make sure you don't drop any useful data. Perhaps this algorithm: for a given mounted device, prefer the mount with the least number of components in the mountroot, then among those prefer the oldest one (coming earlier in mountinfo). This approach would fix OP's problem.
@marcan I was considering deduplication by "first listed" in the mountinfo. This means that it's possible for labeling to shift. But I'm guessing the kernel data structure that holds mountinfo
is populated in order by time. So "first" is original.
There's nothing saying you can't normally mount a filesystem twice, and I think in that case we'd want to expose both.
We could include the bind mounting as a separate node_filesystem_mount_info mapping.
I can imagine that getting high cardinality and high churn, and I'm not sure what it's gaining us.
There's no way to distinguish a filesystem mounted twice from a filesystem mounted and then its root bindmounted elsewhere. As far as I know both of those result in identical kernel state.
Ultimately I think the options are: either show the first mount in mountinfo order, or show root mounts only (but what if a filesystem is only mounted from a subdirectory? then show that instead? what if it's mounted multiple times but never at the root?), or implement some kind of priority order and show the first mount only.
No strong preference, but show the first mount in mountinfo order
seems what you want in most cases. So let's go with this? Unless someone has objections.
good
i have the impression that just reading /proc/self/mountinfo
is sufficient here, why didn't we take this approach here?
i have the impression that just reading
/proc/self/mountinfo
is sufficient here, why didn't we take this approach here?
replying to myself, it seems like the plan is to add a new metric, node_filesystem_mount_info
, that can be used to join on the existing metric to deduplicate things. I asked in the PR (#2970) how that might help here, but it's unclear to me if it's an actual fix or not.
Host operating system:
node_exporter version:
Are you running node_exporter in Docker?
yes
What did you do that produced an error?
With given query below
Result is
Actually second record is bind mounted point. If I can get mount options it would be helpful, to exclude the record.