prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
10.99k stars 2.34k forks source link

Mounstats collector ignores NFS mount even if metrics are different #993

Open 0xArsen opened 6 years ago

0xArsen commented 6 years ago

Host operating system: output of uname -a

Linux debian 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.16.0 (branch: master, revision: ac5a98176129e86c69f664e632ff273eee6f67bd) build user: root@debian build date: 20180706-16:39:05 go version: go1.10.3

node_exporter command line flags

--collector.mountstats --log.level=debug

Are you running node_exporter in Docker?

No

What did you do that produced an error?

I mounted the same device twice, both mounts were on the same mountpoint. The first mount was done using nfs version 3 and the other was done using nfs version 4. Command: mount -t nfs -o vers=3 127.0.0.1:/var/nfs /mnt/nfs Command: mount -t nfs -o vers=4 127.0.0.1:/var/nfs /mnt/nfs

What did you expect to see?

The mounts have some differences(e.g age, port, events stats). Considering that these metrics are different I expected to see a separate line for each nfs mount in node_exporter metrics.

What did you see instead?

I saw metrics for only one mount. node_mountstats_nfs_age_seconds_total{export="127.0.0.1:/var/nfs"} 499 # HELP node_mountstats_nfs_event_jukebox_delay_total Number of times the NFS server indicated EJUKEBOX; retrieving data from offline storage. # HELP node_mountstats_nfs_event_vfs_read_page_total Number of pages read directly via mmap()'d files. # TYPE node_mountstats_nfs_event_vfs_read_page_total counter node_mountstats_nfs_event_vfs_read_page_total{export="127.0.0.1:/var/nfs"} 0 # HELP node_mountstats_nfs_event_vfs_read_pages_total Number of times a group of pages have been read. # TYPE node_mountstats_nfs_event_vfs_read_pages_total counter node_mountstats_nfs_event_vfs_read_pages_total{export="127.0.0.1:/var/nfs"} 0 # HELP node_mountstats_nfs_event_vfs_update_page_total Number of updates (and potential writes) to pages. # TYPE node_mountstats_nfs_event_vfs_update_page_total counter node_mountstats_nfs_event_vfs_update_page_total{export="127.0.0.1:/var/nfs"} 0

My current approach to solving this problem is to have the mounstats_linux.go compare the device name, mountpoint and port. If all those values are the same then the collector will skip it as a duplicate, if not then it will show the metrics. Do you think this is a good solution? I can make a pull request if the approach seems fine.

SuperQ commented 6 years ago

Please attach a copy of /proc/self/mountstats.

0xArsen commented 6 years ago

@SuperQ sorry it took a while, proc/self/mountstats is pasted below. The mounts are similar but there are small differences.

device sysfs mounted on /sys with fstype sysfs
device proc mounted on /proc with fstype proc
device udev mounted on /dev with fstype devtmpfs
device devpts mounted on /dev/pts with fstype devpts
device tmpfs mounted on /run with fstype tmpfs
device /dev/sda1 mounted on / with fstype ext4
device securityfs mounted on /sys/kernel/security with fstype securityfs
device tmpfs mounted on /dev/shm with fstype tmpfs
device tmpfs mounted on /run/lock with fstype tmpfs
device tmpfs mounted on /sys/fs/cgroup with fstype tmpfs
device cgroup mounted on /sys/fs/cgroup/systemd with fstype cgroup
device pstore mounted on /sys/fs/pstore with fstype pstore
device cgroup mounted on /sys/fs/cgroup/devices with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/net_cls,net_prio with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/cpu,cpuacct with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/memory with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/perf_event with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/blkio with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/cpuset with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/freezer with fstype cgroup
device cgroup mounted on /sys/fs/cgroup/pids with fstype cgroup
device systemd-1 mounted on /proc/sys/fs/binfmt_misc with fstype autofs
device debugfs mounted on /sys/kernel/debug with fstype debugfs
device hugetlbfs mounted on /dev/hugepages with fstype hugetlbfs
device mqueue mounted on /dev/mqueue with fstype mqueue
device sunrpc mounted on /run/rpc_pipefs with fstype rpc_pipefs
device nfsd mounted on /proc/fs/nfsd with fstype nfsd
device 127.0.0.1:/var/nfs mounted on /mnt/nfs with fstype nfs statvers=1.1
    opts:   ro,vers=3,rsize=524288,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=48894,mountproto=udp,local_lock=none
    age:    781
    caps:   caps=0x3fcf,wtmult=4096,dtsize=4096,bsize=0,namlen=255
    sec:    flavor=1,pseudoflavor=1
    events: 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    bytes:  0 0 0 0 0 0 0 0 
    RPC iostats version: 1.0  p/v: 100003/3 (nfs)
    xprt:   tcp 822 1 1 0 0 39 39 0 39 0 2 0 0
    per-op statistics
            NULL: 0 0 0 0 0 0 0 0
         GETATTR: 5 5 0 620 560 0 0 0
         SETATTR: 0 0 0 0 0 0 0 0
          LOOKUP: 0 0 0 0 0 0 0 0
          ACCESS: 0 0 0 0 0 0 0 0
        READLINK: 0 0 0 0 0 0 0 0
            READ: 0 0 0 0 0 0 0 0
           WRITE: 0 0 0 0 0 0 0 0
          CREATE: 0 0 0 0 0 0 0 0
           MKDIR: 0 0 0 0 0 0 0 0
         SYMLINK: 0 0 0 0 0 0 0 0
           MKNOD: 0 0 0 0 0 0 0 0
          REMOVE: 0 0 0 0 0 0 0 0
           RMDIR: 0 0 0 0 0 0 0 0
          RENAME: 0 0 0 0 0 0 0 0
            LINK: 0 0 0 0 0 0 0 0
         READDIR: 0 0 0 0 0 0 0 0
     READDIRPLUS: 0 0 0 0 0 0 0 0
          FSSTAT: 7 7 0 888 588 0 0 1
          FSINFO: 2 2 0 208 160 0 1 1
        PATHCONF: 1 1 0 104 56 0 0 0
          COMMIT: 0 0 0 0 0 0 0 0

device tmpfs mounted on /run/user/117 with fstype tmpfs
device tmpfs mounted on /run/user/1000 with fstype tmpfs
device gvfsd-fuse mounted on /run/user/1000/gvfs with fstype fuse.gvfsd-fuse
device fusectl mounted on /sys/fs/fuse/connections with fstype fusectl
device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
device 127.0.0.1:/var/nfs mounted on /mnt/nfs with fstype nfs4 statvers=1.1
    opts:   rw,vers=4.0,rsize=524288,wsize=524288,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none
    age:    519
    caps:   caps=0xffdf,wtmult=512,dtsize=32768,bsize=0,namlen=255
    nfsv4:  bm0=0xfdffbfff,bm1=0xf9be3e,bm2=0x0,acl=0x3,pnfs=not configured
    sec:    flavor=1,pseudoflavor=1
    events: 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    bytes:  0 0 0 0 0 0 0 0 
    RPC iostats version: 1.0  p/v: 100003/4 (nfs)
    xprt:   tcp 866 0 1 0 28 29 29 0 29 0 2 0 0
    per-op statistics
            NULL: 0 0 0 0 0 0 0 0
            READ: 0 0 0 0 0 0 0 0
           WRITE: 0 0 0 0 0 0 0 0
          COMMIT: 0 0 0 0 0 0 0 0
            OPEN: 0 0 0 0 0 0 0 0
    OPEN_CONFIRM: 0 0 0 0 0 0 0 0
     OPEN_NOATTR: 0 0 0 0 0 0 0 0
    OPEN_DOWNGRADE: 0 0 0 0 0 0 0 0
           CLOSE: 0 0 0 0 0 0 0 0
         SETATTR: 0 0 0 0 0 0 0 0
          FSINFO: 1 1 0 140 108 0 0 0
           RENEW: 0 0 0 0 0 0 0 0
     SETCLIENTID: 0 0 0 0 0 0 0 0
    SETCLIENTID_CONFIRM: 0 0 0 0 0 0 0 0
            LOCK: 0 0 0 0 0 0 0 0
           LOCKT: 0 0 0 0 0 0 0 0
           LOCKU: 0 0 0 0 0 0 0 0
          ACCESS: 1 1 0 148 124 0 0 0
         GETATTR: 1 1 0 140 196 0 0 0
          LOOKUP: 1 1 0 156 252 0 0 0
     LOOKUP_ROOT: 0 0 0 0 0 0 0 0
          REMOVE: 0 0 0 0 0 0 0 0
          RENAME: 0 0 0 0 0 0 0 0
            LINK: 0 0 0 0 0 0 0 0
         SYMLINK: 0 0 0 0 0 0 0 0
          CREATE: 0 0 0 0 0 0 0 0
        PATHCONF: 1 1 0 136 72 0 0 0
          STATFS: 0 0 0 0 0 0 0 0
        READLINK: 0 0 0 0 0 0 0 0
         READDIR: 0 0 0 0 0 0 0 0
     SERVER_CAPS: 2 2 0 272 184 0 0 0
     DELEGRETURN: 0 0 0 0 0 0 0 0
          GETACL: 0 0 0 0 0 0 0 0
          SETACL: 0 0 0 0 0 0 0 0
    FS_LOCATIONS: 0 0 0 0 0 0 0 0
    RELEASE_LOCKOWNER: 0 0 0 0 0 0 0 0
         SECINFO: 0 0 0 0 0 0 0 0
    FSID_PRESENT: 0 0 0 0 0 0 0 0
     EXCHANGE_ID: 0 0 0 0 0 0 0 0
    CREATE_SESSION: 0 0 0 0 0 0 0 0
    DESTROY_SESSION: 0 0 0 0 0 0 0 0
        SEQUENCE: 0 0 0 0 0 0 0 0
    GET_LEASE_TIME: 0 0 0 0 0 0 0 0
    RECLAIM_COMPLETE: 0 0 0 0 0 0 0 0
       LAYOUTGET: 0 0 0 0 0 0 0 0
    GETDEVICEINFO: 0 0 0 0 0 0 0 0
    LAYOUTCOMMIT: 0 0 0 0 0 0 0 0
    LAYOUTRETURN: 0 0 0 0 0 0 0 0
    SECINFO_NO_NAME: 0 0 0 0 0 0 0 0
    TEST_STATEID: 0 0 0 0 0 0 0 0
    FREE_STATEID: 0 0 0 0 0 0 0 0
    GETDEVICELIST: 0 0 0 0 0 0 0 0
    BIND_CONN_TO_SESSION: 0 0 0 0 0 0 0 0
    DESTROY_CLIENTID: 0 0 0 0 0 0 0 0
            SEEK: 0 0 0 0 0 0 0 0
        ALLOCATE: 0 0 0 0 0 0 0 0
      DEALLOCATE: 0 0 0 0 0 0 0 0
     LAYOUTSTATS: 0 0 0 0 0 0 0 0
           CLONE: 0 0 0 0 0 0 0 0
            COPY: 0 0 0 0 0 0 0 0
SuperQ commented 6 years ago

One thing to note, last I tested (it's been a while), the last mount on a mount point will "win" in terms of which will be used for access. With this example, the vers=3 mount will be there, but will not get any traffic as soon as vers=4 is mounted over the top of it.

Either way, we should include the NFS version as a label, that would be useful for monitoring.

brian-brazil commented 6 years ago

Hmm, might the version make more sense as a metric rather than a label? It doesn't sound like an identifier.

SuperQ commented 6 years ago

@brian-brazil It's kinda like fstype. But as demonstrated by this issue, we need it to differentiate between two instances on the same mountpoint (yay Linux).

discordianfish commented 5 years ago

Has this been fixed by #998?

0xArsen commented 5 years ago

998 accounted for a TCP or a UDP mount. The fix for this will need to account for version 4 or a version 3 mount.

discordianfish commented 5 years ago

Ah got it, thanks. So yeah still open to look into a PR to address that.

0xArsen commented 5 years ago

Another PR would need to be made within prometheus/procfs in order for node_exporter to use the NFS version as a label. I previously opened an issue within procfs(102) in relation to the mount version. If possible, let's continue the conversation there.