google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.89k stars 2.31k forks source link

Why not use statfs for ZFS? #1884

Open Random-Liu opened 6 years ago

Random-Liu commented 6 years ago

In cadvisor, we are using a special function getZfsStats for zfs (see here).

But use statfs for other filesystems.

ZFS does support statfs, why don't we use it? Is there any particular reason?

See https://github.com/zfsonlinux/zfs/blob/1b66810bad0a893031c6d49613aa83dc359bf034/module/zfs/zfs_vfsops.c#L1302-L1303

dashpole commented 6 years ago

https://github.com/google/cadvisor/pull/1555#issuecomment-286864135 is all the context I have

dashpole commented 6 years ago

cc @bakins in-case he has any insight

bakins commented 6 years ago

Statfs used to not work correctly with ZFS. I can test again to see if it does now.

nightah commented 6 years ago

Not sure if related so happy to lodge this as another issue if advised but I appear to be having an issue whenever a docker container that cadvisor is monitoring is removed:

E0213 22:55:53.446365       1 fs.go:418] Stat fs failed. Error: exit status 1: "/usr/sbin/zfs zfs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset nerv/ROOT/arch/13d5ce6a00b5331dfcf92c9b3cd50c7b5eefb570bb6d471b002489eb79d8fb61" => cannot open 'nerv/ROOT/arch/13d5ce6a00b5331dfcf92c9b3cd50c7b5eefb570bb6d471b002489eb79d8fb61': dataset does not exist

This behaviour can be re-produced by starting cadvisor with a number of running docker containers on the ZFS storage driver, removing any of the said containers. When the container in question is removed, cadvisor throws an error as above, I guess this is because the zfs dataset respective to the removed container has also been removed.

Restarting cadvisor does stop the error from appearing, but I suspect that's because of the datasets that are discovered on startup.

dashpole commented 6 years ago

@nightah, you are correct. We have a similar isssue for devicemapper: https://github.com/google/cadvisor/issues/1772. My current thinking is that watching the mounts, and updating them would solve this, but it is a fairly complicated problem

nightah commented 6 years ago

@dashpole thanks for the heads up, I tried looking around to see if there was anything relating to ZFS didn't think to check devicemapper too. I'll follow both of these issues moving forward to see how they progress. Thanks.