it-novum / openitcockpit-agent-go

Cross-Platform Monitoring Agent for openITCOCKPIT written in Go
https://openitcockpit.io/download_agent/
Apache License 2.0
5 stars 2 forks source link

Disk usage check generates runtime error #51

Closed exa-mk closed 3 years ago

exa-mk commented 3 years ago

Agent Mode:

Versions

Operating system Debian 10 (buster), Kernel 5.3.18-2-pve (Proxmox 6.3-3)

Describe the bug On one of our Proxmox Clusters (8 nodes), the disk usage check produces an runtime error (on all nodes). On our other Proxmox clusters it works fine.

To Reproduce Steps to reproduce the behavior: ?

Expected behavior Check working without runtime error

Screenshots grafik

Additional context Debug output from the agent:

[...]
DEBU[1080] Begin Check:  agent
DEBU[1080] Finish Check:  agent
DEBU[1080] Begin Check:  swap
DEBU[1080] Finish Check:  swap
DEBU[1080] Begin Check:  users
DEBU[1080] Finish Check:  users
DEBU[1080] Begin Check:  disks
ERRO[1080] Check  disks : !!PANIC!!  runtime error: invalid memory address or nil pointer dereference
DEBU[1080] Finish Check:  disks
DEBU[1080] Begin Check:  disk_io
DEBU[1080] Finish Check:  disk_io
DEBU[1080] Begin Check:  system_load
DEBU[1080] Finish Check:  system_load
[...]
nook24 commented 3 years ago

Hi @exa-mk, which file system are you using? Do both Proxmox Clusters are using the same file system for / ?

exa-mk commented 3 years ago

Good point, didn't realize the colleague installed this one differently. The erroneous cluster uses XFS as root file system while the others use default EXT4.

nook24 commented 3 years ago

I'm monitoring XFS, ZFS and ext4 through the Agent so XFS itself should not be the issue. I installed a Proxmox VE 6.4-7 5.4.114-1-pve real quick to check if the PVE installer creates the XFS with some weird flags but the agent is running fine on this system.

I assume it has probably nothing to do with XFS as root filesystem but with some other mountpoint / device. Could you please paste the content of /proc/self/mountinfo? Mine is looking like so:

24 29 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
25 29 0:5 / /proc rw,relatime shared:14 - proc proc rw
26 29 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=1987316k,nr_inodes=496829,mode=755
27 26 0:23 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
28 29 0:24 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs rw,size=403064k,mode=755
29 1 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/pve-root rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
30 24 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
31 26 0:25 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
32 28 0:26 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
33 24 0:27 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
34 33 0:28 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw
35 33 0:29 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,xattr,name=systemd
36 24 0:30 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:12 - pstore pstore rw
37 24 0:31 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:13 - bpf none rw,mode=700
38 33 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,cpu,cpuacct
39 33 0:33 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,cpuset
40 33 0:34 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,net_cls,net_prio
41 33 0:35 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,freezer
42 33 0:36 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,pids
43 33 0:37 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,memory
44 33 0:38 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,blkio
45 33 0:39 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,perf_event
46 33 0:40 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,rdma
47 33 0:41 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,devices
48 33 0:42 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,hugetlb
49 25 0:43 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14811
50 24 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
51 26 0:20 / /dev/mqueue rw,relatime shared:28 - mqueue mqueue rw
52 26 0:44 / /dev/hugepages rw,relatime shared:29 - hugetlbfs hugetlbfs rw,pagesize=2M
53 28 0:45 / /run/rpc_pipefs rw,relatime shared:30 - rpc_pipefs sunrpc rw
116 24 0:21 / /sys/kernel/config rw,relatime shared:61 - configfs configfs rw
119 24 0:47 / /sys/fs/fuse/connections rw,relatime shared:63 - fusectl fusectl rw
208 29 0:51 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:117 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
213 29 0:52 / /etc/pve rw,nosuid,nodev,relatime shared:120 - fuse /dev/fuse rw,user_id=0,group_id=0,default_permissions,allow_other
218 28 0:53 / /run/user/0 rw,nosuid,nodev,relatime shared:123 - tmpfs tmpfs rw,size=403060k,mode=700
exa-mk commented 3 years ago
root@c01-dev01 [kvm]: ~ # cat /proc/self/mountinfo 
23 28 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
24 28 0:5 / /proc rw,relatime shared:14 - proc proc rw
25 28 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev rw,size=396262192k,nr_inodes=99065548,mode=755
26 25 0:23 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
27 28 0:24 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs rw,size=79257380k,mode=755
28 1 253:69 / / rw,relatime shared:1 - xfs /dev/mapper/pve-root rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
29 23 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 - securityfs securityfs rw
30 25 0:25 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
31 27 0:26 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs tmpfs rw,size=5120k
32 23 0:27 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
33 32 0:28 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw
34 32 0:29 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,xattr,name=systemd
35 23 0:30 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:12 - pstore pstore rw
36 23 0:31 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:13 - bpf bpf rw,mode=700
37 32 0:32 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,pids
38 32 0:33 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,net_cls,net_prio
39 32 0:34 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,perf_event
40 32 0:35 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,cpuset
41 32 0:36 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,cpu,cpuacct
42 32 0:37 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,memory
43 32 0:38 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,blkio
44 32 0:39 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,hugetlb
45 32 0:40 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,rdma
46 32 0:41 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,devices
47 32 0:42 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,freezer
48 24 0:43 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=35391
49 23 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
50 25 0:44 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs rw,pagesize=2M
51 25 0:20 / /dev/mqueue rw,relatime shared:29 - mqueue mqueue rw
52 27 0:45 / /run/rpc_pipefs rw,relatime shared:30 - rpc_pipefs sunrpc rw
115 23 0:21 / /sys/kernel/config rw,relatime shared:61 - configfs configfs rw
118 23 0:47 / /sys/fs/fuse/connections rw,relatime shared:63 - fusectl fusectl rw
207 28 0:50 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:117 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
384 28 0:53 / /mnt/pve/proxmox_share rw,relatime shared:200 - nfs fs02.srv.exasol.com:/nfs/data/proxmox rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.44.100.12,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.44.100.12
390 48 0:54 / /proc/sys/fs/binfmt_misc rw,relatime shared:203 - binfmt_misc binfmt_misc rw
90 28 0:52 / /etc/pve rw,nosuid,nodev,relatime shared:111 - fuse /dev/fuse rw,user_id=0,group_id=0,default_permissions,allow_other
443 27 0:51 / /run/user/0 rw,nosuid,nodev,relatime shared:247 - tmpfs tmpfs rw,size=79257376k,mode=700
nook24 commented 3 years ago

I fixed an error that in case the system could not detect the disk usage of a given mountpoint the check will no longer crash with a panic.

New Version: https://github.com/it-novum/openitcockpit-agent-go/releases/tag/3.0.3 (also available through the website and repositories)

Please let me know if this resolves the issue.

exa-mk commented 3 years ago

Yes, it's working now... and I get a proper error message for the corrupt mountpoint as well:

[...]
DEBU[0000] Begin Check:  disks                          
ERRO[0000] DiskCheck: Error for  /mnt/pve/proxmox_share stale NFS file handle 
DEBU[0000] Finish Check:  disks                         
[...]
nook24 commented 3 years ago

Perfect, I'll will mark the issue as resolved.