sensu / check-disk-usage

MIT License
2 stars 8 forks source link

Throws error after valid check with invalid memory address or nil pointer dereference when run via sensu user #3

Closed espenaf closed 3 years ago

espenaf commented 3 years ago

Running the plugin either via sensu go backend or via console via "sudo -u sensu check-disk-usage ..." the plugin throws the error shown below. Running as root does not produce the error. The actual disk check seems to work fine, but there is something happening just as the plugin seem to finish up.

Test with simular results on CentOS 8 and Arch, using wither Sensu Go 6.1.3 og 6.2.

check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xba8d7d]

goroutine 1 [running]:
main.executeCheck(0x0, 0x0, 0x0, 0x0)
    /home/runner/work/check-disk-usage/check-disk-usage/main.go:126 +0x14d
github.com/sensu-community/sensu-plugin-sdk/sensu.(*GoCheck).goCheckWorkflow(0xc0001f8580, 0xc000270340, 0x0, 0x2, 0xc00026aa00, 0xc00027dcb0, 0xb93223)
    /home/runner/go/pkg/mod/github.com/sensu-community/sensu-plugin-sdk@v0.11.0/sensu/gocheck.go:59 +0xed
github.com/sensu-community/sensu-plugin-sdk/sensu.(*basePlugin).cobraExecuteFunction(0xc0001f8580, 0xc000270340, 0x0, 0x2, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/sensu-community/sensu-plugin-sdk@v0.11.0/sensu/goplugin.go:234 +0x65
github.com/sensu-community/sensu-plugin-sdk/sensu.(*basePlugin).initPlugin.func1(0xc000230dc0, 0xc000270340, 0x0, 0x2, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/sensu-community/sensu-plugin-sdk@v0.11.0/sensu/goplugin.go:119 +0xad
github.com/spf13/cobra.(*Command).execute(0xc000230dc0, 0xc000190010, 0x2, 0x2, 0xc000230dc0, 0xc000190010)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc000230dc0, 0xc0001f8580, 0x0, 0x0)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
    /home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
github.com/sensu-community/sensu-plugin-sdk/sensu.(*basePlugin).Execute(0xc0001f8580)
    /home/runner/go/pkg/mod/github.com/sensu-community/sensu-plugin-sdk@v0.11.0/sensu/goplugin.go:247 +0x3e
main.main()
    /home/runner/work/check-disk-usage/check-disk-usage/main.go:97 +0x7e
nixwiz commented 3 years ago

Works fine for me on a vanilla CentOS 8 system:

sudo -u sensu ./check-disk-usage
check-disk-usage       OK: /dev 0.00% - Total: 8.2 GB, Used: 0 B, Free: 8.2 GB
check-disk-usage       OK: /dev/shm 0.00% - Total: 8.2 GB, Used: 0 B, Free: 8.2 GB
check-disk-usage       OK: /run 1.47% - Total: 8.2 GB, Used: 121 MB, Free: 8.1 GB
check-disk-usage       OK: /sys/fs/cgroup 0.00% - Total: 8.2 GB, Used: 0 B, Free: 8.2 GB
check-disk-usage       OK: / 17.83% - Total: 54 GB, Used: 9.6 GB, Free: 44 GB
check-disk-usage       OK: /home 14.77% - Total: 54 GB, Used: 7.9 GB, Free: 46 GB
check-disk-usage       OK: /boot 30.28% - Total: 1.0 GB, Used: 288 MB, Free: 664 MB
check-disk-usage       OK: /boot/efi 1.15% - Total: 628 MB, Used: 7.2 MB, Free: 621 MB
check-disk-usage       OK: /run/user/1000 0.00% - Total: 1.6 GB, Used: 0 B, Free: 1.6 GB

I'm curious what kind of output you receive from sudo -u sensu df -TPh.

espenaf commented 3 years ago
Filesystem                     Type      Size  Used Avail Use% Mounted on
devtmpfs                       devtmpfs  7.8G     0  7.8G   0% /dev
tmpfs                          tmpfs     7.9G   84K  7.9G   1% /dev/shm
tmpfs                          tmpfs     7.9G  2.0M  7.9G   1% /run
tmpfs                          tmpfs     7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/nvme0n1p2                 btrfs     476G   64G  412G  14% /
/dev/sdb                       btrfs     117G   27G   86G  25% /store/raid1
/dev/sda1                      ext4      976M  404M  506M  45% /boot
tmpfs                          tmpfs     1.6G  4.0K  1.6G   1% /run/user/1001
//10.0.0.15/raid2              cifs       51T   33T   19T  64% /store/raid2
tmpfs                          tmpfs     1.6G  4.0K  1.6G   1% /run/user/1000

I have tried running with only ext4, btrfs or cifs seperatly, but still the same problem.

nixwiz commented 3 years ago

I have released version 0.1.1 which adds some error checking (had left it out while prototyping and forgot to implement before releasing 😑). This won't fix the issue, but it should help us track down the cause.

espenaf commented 3 years ago
sudo -u sensu ./check-disk-usage -I /boot
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
Usage:
  check-disk-usage [flags]
  check-disk-usage [command]

Available Commands:
  help        Help about any command
  version     Print the version number of this plugin

Flags:
  -c, --critical float            Critical threshold for file system usage (default 95)
  -E, --exclude-fs-path strings   Comma separated list of file system paths to exclude from checking
  -e, --exclude-fs-type strings   Comma separated list of file system types to exclude from checking
  -h, --help                      help for check-disk-usage
  -I, --include-fs-path strings   Comma separated list of file system paths to check
  -i, --include-fs-type strings   Comma separated list of file system types to check
  -w, --warning float             Warning threshold for file system usage (default 85)

Use "check-disk-usage [command] --help" for more information about a command.

Error executing check-disk-usage: error executing check: Failed to get disk usage for /var/lib/containers/storage/overlay, error: permission denied

sudo -u sensu ./check-disk-usage -E /var/lib/containers/storage/overlay
check-disk-usage       OK: /dev 0.00% - Total: 8.4 GB, Used: 0 B, Free: 8.4 GB
check-disk-usage       OK: /dev/shm 0.00% - Total: 8.4 GB, Used: 86 kB, Free: 8.4 GB
check-disk-usage       OK: /run 0.02% - Total: 8.4 GB, Used: 2.1 MB, Free: 8.4 GB
check-disk-usage       OK: /sys/fs/cgroup 0.00% - Total: 8.4 GB, Used: 0 B, Free: 8.4 GB
check-disk-usage       OK: / 13.35% - Total: 511 GB, Used: 68 GB, Free: 442 GB
check-disk-usage       OK: /store/raid2 63.71% - Total: 56 TB, Used: 36 TB, Free: 20 TB
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
check-disk-usage       OK: /run/user/1001 0.00% - Total: 1.7 GB, Used: 4.1 kB, Free: 1.7 GB
Usage:
  check-disk-usage [flags]
  check-disk-usage [command]

Available Commands:
  help        Help about any command
  version     Print the version number of this plugin

Flags:
  -c, --critical float            Critical threshold for file system usage (default 95)
  -E, --exclude-fs-path strings   Comma separated list of file system paths to exclude from checking
  -e, --exclude-fs-type strings   Comma separated list of file system types to exclude from checking
  -h, --help                      help for check-disk-usage
  -I, --include-fs-path strings   Comma separated list of file system paths to check
  -i, --include-fs-type strings   Comma separated list of file system types to check
  -w, --warning float             Warning threshold for file system usage (default 85)

Use "check-disk-usage [command] --help" for more information about a command.

Error executing check-disk-usage: error executing check: Failed to get disk usage for /var/lib/containers/storage/overlay, error: permission denied```
nixwiz commented 3 years ago

I'm curious how version 0.2.0 will work for you.

espenaf commented 3 years ago

Works better now:

check-disk-usage       OK: / 13.37% - Total: 511 GB, Used: 68 GB, Free: 442 GB
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
check-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay - error: permission denied

The last strange thing is why /var/lib/containers/storage/overlay is part of the output even if i try to exclude it by path or type, like this:

sudo -u sensu ./check-disk-usage -e tmpfs,overlay,devtmpfs,cifs -E /var/lib/containers/storage/overlay
check-disk-usage       OK: / 13.37% - Total: 511 GB, Used: 68 GB, Free: 442 GB
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
check-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay - error: permission denied

The check is all green, but would be better if it was never checked, or is this some other filesystem?

nixwiz commented 3 years ago

I've added a linux amd64 binary named check-disk-usage-debug to the assets for 0.2.0, you can download it here. It adds a debug line of output to see what information is actually returned for each file system. Can you download and give it a try?

espenaf commented 3 years ago

Seems /var/lib/containers/storage/overlay also is an btrfs device. Regardless if i add btrfs to the excluded device, it will be marked as UNKNOWN with the given error:

sudo -u sensu ./check-disk-usage-debug -e tmpfs,overlay,devtmpfs,cifs,btrfs -E /var/lib/containers/storage/overlay
DEBUG: Device=/dev/nvme0n1p2, Mountpoint=/, Fstype=btrfs
DEBUG: Device=/dev/sda1, Mountpoint=/boot, Fstype=ext4
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
DEBUG: Device=/dev/nvme0n1p2, Mountpoint=/var/lib/containers/storage/overlay, Fstype=btrfs
check-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay - error: permission denied

The /var/lib/containers/storage/overlay folder has the following permission

drwx------. 1 root root 60452 Dec 27 11:27 overlay
nixwiz commented 3 years ago

Okay, I think I'll re-organize and have the include/exclude checks happen before attempting to get the file system usage. Hang on, another release is coming.

nixwiz commented 3 years ago

0.3.0 is released for your testing/review.

espenaf commented 3 years ago

Everything works now. Thanks for the great work.

Found unrelated bug, when running with -p

sudo -u sensu ./check-disk-usage -p
check-disk-usage       OK: /dev 0.00% - Total: 8.4 GB, Used: 0 B, Free: 8.4 GB
check-disk-usage       OK: /dev/shm 0.00% - Total: 8.4 GB, Used: 86 kB, Free: 8.4 GB
check-disk-usage       OK: /run 0.02% - Total: 8.4 GB, Used: 2.1 MB, Free: 8.4 GB
check-disk-usage       OK: /sys/fs/cgroup 0.00% - Total: 8.4 GB, Used: 0 B, Free: 8.4 GB
check-disk-usage       OK: / 13.37% - Total: 511 GB, Used: 68 GB, Free: 442 GB
check-disk-usage       OK: /boot 44.40% - Total: 1.0 GB, Used: 423 MB, Free: 530 MB
check-disk-usage       OK: /run/user/1001 0.00% - Total: 1.7 GB, Used: 4.1 kB, Free: 1.7 GB
check-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay - error: permission deniedcheck-disk-usage       OK: /store/raid2 63.73% - Total: 56 TB, Used: 36 TB, Free: 20 TB
check-disk-usage       OK: /run/netns 0.02% - Total: 8.4 GB, Used: 2.1 MB, Free: 8.4 GB
check-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay-containers/92d3550feb153836ef7e41d2758d8b9a6f2f981c9eb13604a191cf49de4139ce/userdata/shm - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay/5c49f37ec1443612a8298673e4d227229e90754f21b46dc41321b0c426f4b619/merged - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay-containers/0612a9b621fb4419adf942b2529772b15670b8e5311d2c21d0e2b8d3e6559c1b/userdata/shm - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay/28f0f04a7ca2db8b630a447c867ff959c39e1ff2a0d5afb3d1b3451175095463/merged - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay-containers/91dcf2e7c03c3c7226a1a1414e2660d249ee2d37cf61a46f9e7d055199f16b32/userdata/shm - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay/6d9cafa74ac9a0d6726c1975eec048447c042d2ff44feda6e9f1675d308b81db/merged - error: permission deniedcheck-disk-usage  UNKNOWN: /var/lib/containers/storage/overlay-containers/874ffc5a0817c6bcd5d24e077356227267bc4f697e44b85d63291c2231d76be8/userdata/shm - error: permission deniedcheck-disk

The issue seems to be related to a missing endline after a device containing an error. The device /store/raid2 device check is concatenated without an endline after the /var/lib/containers/storage/overlay device which has an error.

nixwiz commented 3 years ago

Thanks for pointing that out (releasing 0.3.1 to fix) and for all of your great feedback.

espenaf commented 3 years ago

I have been looking for a replacement for the Ruby based disk check, so this is great. Just tested 0.3.1 on all devices, and it works great now.