sensu-plugins / sensu-plugins-disk-checks

This plugin provides native disk instrumentation for monitoring and metrics collection, including: health, usage, and various metrics.
http://sensu-plugins.io
MIT License
27 stars 63 forks source link

check-disk.rb docker issue with permissions #77

Open jcputter opened 7 years ago

jcputter commented 7 years ago

Hi,

I'm running the check-disk.rb plugin on a docker host. If i run /opt/sensu/embedded/bin/check-disk-usage.rb -x cgroup -p /var/lib/docker -c95 -w90 I have the expected output "OK"

However when its running via Sensu

{"timestamp":"2017-08-04T23:42:04.965151+0200","level":"info","message":"publishing check result","payload":{"client":"docker","check":{"command":"/opt/sensu/embedded/bin/check-disk-usage.rb -x cgroup,proc,shm,none,nsfs -p /run/lxcfs,/var/lib/docker -c95 -w90","subscribers":["docker"],"standalone":true,"handlers":["slack","pagerduty"],"interval":30,"occurrences":2,"refresh":1800,"name":"check_disk","issued":1501882924,"executed":1501882924,"duration":0.067,"output":"CheckDisk WARNING: /var/lib/docker/aufs/mnt/0c545df7098b189bf1b11c6efb0c2c0677f12db9636e8b19a726a5775d861f68 Unable to read., /var/lib/docker/containers/ccea5f270de5ac5274314e21b05e87077742ccccf05d6765cf0a10c54830996b/shm Unable to read., /var/lib/docker/aufs/mnt/b87f8b0b649c482c430285f73bfc1a4245bb967b3143c131ee9a0d8bc4aebd37 Unable to read., /var/lib/docker/containers/d130b3529c2e500ec607424c7c26ca6c4371beabd4154c0796e04c6d0d689127/shm Unable to read., /var/lib/docker/aufs/mnt/1115a010f5184a6163414428776e5fec80f90aadeb0c589907cdb8bdeb94185d Unable to read., /var/lib/docker/containers/822aa8148f35280d3463ff11c870c42ff89e312b80cb04362edfdb7c9d2c3204/shm Unable to read., /var/lib/docker/aufs/mnt/d7f1a80ae7293d680a5cf7dd7eae6812d0fc02138a32f85a5d5dadb5352b6c44 Unable to read., /var/lib/docker/containers/cfb46d35c2dfa815c9214b0795359f6c7c8350947cc79751d0cb94e21e06b069/shm Unable to read.\n","status":1}}}

OS: Ubuntu 16.04 LTS Sensu: 1.0.2 Check-Disk: 2.4.0

majormoses commented 7 years ago

We can do a rescue and unknown but this is certainly a permission thing.

majormoses commented 7 years ago

Actually taking a look at the code that is exactly what it is doing: https://github.com/sensu-plugins/sensu-plugins-disk-checks/blob/2.4.0/bin/check-disk-usage.rb#L149

I have the expected output "OK" It would but it can't read why would "OK" be expected?

Running the same command:

$ ./bin/check-disk-usage.rb -x cgroup -p /var/lib/docker -c95 -W90
CheckDisk WARNING: /sys/kernel/debug/tracing Unable to read., /run/docker/netns/39eec869e2b1 Unable to read., /run/docker/netns/70778fb53238 Unable to read., /run/docker/netns/abc810a2d2f4 Unable to read.

Verifying that that it is a permissions issue:

$ ls -l /sys/kernel/debug/tracing
ls: cannot access '/sys/kernel/debug/tracing': Permission denied

I think what we need is a flag to ignore not just mount points but also dirs inside them. Does that sound good?

sysboss commented 7 years ago

I have the same issue.

Solved by adding -d flag: check-disk-usage.rb -x proc -x cgroup -x tmpfs -p sys

snadorp commented 7 years ago

To get around this problem we added these flags: -x tmpfs,overlay,nsfs

majormoses commented 7 years ago

It sounds like we should also write up something in the README with common gotchas and how to solve them.

majormoses commented 7 years ago

Anyone want to attempt a PR to ignore the dirs inside of them and not just the mount points?

mostafahussein commented 6 years ago

@majormoses I would like to help on this if still applicable, As I understand the issue could be solved by ignoring the file system itself, Could you explain more this statement "ignore the dirs inside of them and not just the mount points". Do you mean that the chek disk script should be have the ability to ignore these directory or something else ?

majormoses commented 6 years ago

After re-reading my comment and playing around with the the options again it seems like you might be able to do it without modification. This probably should work for most use cases with some minor tweaking:

Single capture group matching multiple paths (best for simple situations):

$ bundle exec ./bin/check-disk-usage.rb -x cgroup -p "(\/var\/lib\/docker|\/run\/docker)"
CheckDisk WARNING: / 87.4% bytes usage (193.34 GiB/221.23 GiB)

Multiple capture groups that are optional (can accommodate more complex scenarios):

$ bundle exec ./bin/check-disk-usage.rb -x cgroup,nsfs -p "(/var/lib/docker)?(/run/docker)?"
CheckDisk OK: All disk usage under 85% and inode usage under 85%

Also like others pointed out many of the errors/warnings can be avoided by not checking certain file systems other times it just makes sense to have sensu discard the output. At least in the case of ignoring the appropriate stuff with docker is very doable from what I see and the existing options:

$ mount | grep docker
/dev/sdc2 on /var/lib/docker/aufs type ext4 (rw,noatime,errors=remount-ro,data=ordered)
none on /var/lib/docker/aufs/mnt/935f42ea862f12177c30949cf71ed7dca72b4e379b8abc1a26c6cbf901f42301 type aufs (rw,relatime,si=968d4b9c7ae823ac,dio,dirperm1)
shm on /var/lib/docker/containers/12b6ec1b70879c0bcceedb032db77e0954cc4b8c3b8cdc51d0b1d0208a623e24/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/7a985efacbe2 type nsfs (rw)
none on /var/lib/docker/aufs/mnt/4a6fe3ade8018ce62ca7ce0e1aab8cfc9a4b924a6fae43fa720420ff4e8a91b0 type aufs (rw,relatime,si=968d4b9fb6cd53ac,dio,dirperm1)
shm on /var/lib/docker/containers/dd7dd22b4e2a34493c771a5efaa26c57f359846852d1d84c2b5840fb76eab96b/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/b9d1529da8f1 type nsfs (rw)
none on /var/lib/docker/aufs/mnt/98d9ed132f26331e98ec669939c75154b0733374b74112c9bd80d5fb4b76e5e0 type aufs (rw,relatime,si=968d4b9c8bf013ac,dio,dirperm1)
shm on /var/lib/docker/containers/bd6679f4f9aa2d489a498ece1a2c195b70bea747e0e2ae3da33bf85ca9d2ce18/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/257da5746791 type nsfs (rw)
none on /var/lib/docker/aufs/mnt/fed37bac4ce561913cd64f1b7c7a523de859b8954324e0822c9a80453371cab1 type aufs (rw,relatime,si=968d4b9bfe5f93ac,dio,dirperm1)
shm on /var/lib/docker/containers/4ded1ee1f58bab4c703028bacc7c215daa95b7575bd162139bc2580917b8b41b/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/4a1ca194458d type nsfs (rw)
none on /var/lib/docker/aufs/mnt/b7868ed02b3baccee33726ffaf1cb4358fc1cf355a54092c57b6c6092367ce5c type aufs (rw,relatime,si=968d4b9913f873ac,dio,dirperm1)
shm on /var/lib/docker/containers/51e9cf1f8173fa7c56390c0f9a97af32fd8edf0351a7693a8aa93c7923541a04/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/6bba08fe2ed1 type nsfs (rw)
none on /var/lib/docker/aufs/mnt/b53c6bdc4d5b7d1f58d0acf3ef80fec5c2930133464569d7b7cb40fb5d139f03 type aufs (rw,relatime,si=968d4b99185263ac,dio,dirperm1)
shm on /var/lib/docker/containers/6c2fbc7823f80b659a42b0a5f968767e5437d8500e683109b3f88f4a4c0aa525/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
nsfs on /run/docker/netns/6e4612011155 type nsfs (rw)

I am not sure if there are other use cases that are eluding me but I think just some simple readme updates might be in order.

majormoses commented 6 years ago

One thing to point out that looking at the option: https://github.com/sensu-plugins/sensu-plugins-disk-checks/blob/2.5.1/bin/check-disk-usage.rb#L56-L59 is that this is a regex so -p thing1,thing2 will not work while (thing1|thing2) or (thing1)?(thing2)? would work just fine as these are native regex functions. This also can be a substring and with just the 2 paths I ignored up there all the mount points docker uses are covered by this at least on my local machine.

majormoses commented 6 years ago

sorry about the accidental close.