sflow / host-sflow

host-sflow agent
http://sflow.net
Other
146 stars 55 forks source link

No VM disk statistics #11

Open fumok opened 7 years ago

fumok commented 7 years ago

Hi,

I'm testing a setup to monitor kvm with ganglia, everything seems fine except for VM disk statistics (empty graph). Some detail:

Can you help me? Thanks a lot.

sflow commented 7 years ago

This is the relevant code: https://github.com/sflow/host-sflow/blob/master/src/Linux/mod_kvm.c#L114-L174

Is is missing the capacity/allocation/available data from virDomainGetBlockInfo(), or the reads/writes/errors counter data from virDomainBlockStats(), or both?

The binary is compiled with "-g -O2" so you should be able to set a breakpoint in gdb and debug like this: sudo yum install gdb sudo service hsflowd stop sudo gdb hsflowd gdb> set args -ddd gdb> b mod_kvm.c:169 (say yes it will be loaded later) gdb> r

If you'd rather build from sources and add print statements then you'll need gcc and something like this: sudo yum install libvirt-devel libxml2-devel make FEATURES="kvm ovs"

fumok commented 7 years ago

Thanks for supporting me. After further digging, I think the disk statistics are populated only if vm uses default storage pool.

I've created a new test vm with a cqow image in default storage pool, and now, dumping data with sflowtool, I can see valorized those counters: vdsk_capacity vdsk_allocation vdsk_available vdsk_rd_req vdsk_rd_bytes vdsk_wr_req vdsk_wr_bytes vdsk_errs

Incidentally now the break point works (b mod_kvm.c:169). Before, with not standard storage pool vm, it had never been reached.

EDIT: Perhaps I was too hasty in my diagnosis. I think that the discriminant is the type of storage pool. With qcow images counters are ok, with lvm do not.

EDIT2: I can confirm, disk counters work only with qcow backing storage.

sflow commented 7 years ago

Sounds like it's not recognizing the lvm storage when it parses the XML. If you could send the output of "virsh dumpxml " that would be helpful.

fumok commented 7 years ago

hsflowd in debug mode returns:

dbg1: attribute dev dbg1: disk.dev=hda dbg1: ignoring readonly device dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vda dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vdb dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vdc dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vdd dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vde dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vdf dbg1: attribute dev dbg1: attribute dev dbg1: disk.dev=vdg

and this is right: one virtual cdrom (hda) and seven virtual lvm disk (from vda to vdg)

Follow xml of test vm:

test.txt

sflow commented 7 years ago

It looks like we might be able to pick up the disk stats a different way...

It depends on how a qemu VM is treated with respect to Linux cgroups. For example, I have a KVM system running Ubuntu 14.4 with a VM called "test6" and it looks like I can get disk stats like this:

cat /sys/fs/cgroup/blkio/machine/test6.libvirt-qemu/blkio.throttle.*

Can you substitute "test6" with "test" on your system and get numbers this way? Do the numbers look as though they are specific to that VM?

We already pick up cgroup stats this way in hsflowd's mod_docker, so this could be quite straightforward to add. Just need to understand how it appears in different versions of KVM. I'll try Fedora 24 next and see what appears there.

fumok commented 7 years ago

I think cgroup it's the right way. On RHEL7.2 (kernel 3.10.0-327.el7.x86_64) the relevant counter can be found in:

/sys/fs/cgroup/blkio/machine.slice/{weird_name}/

{weird_name} is something like this (in my case the hostname contain a hyphen):

machine-qemu\x2d{host_hostname}\x2dgo\x2d{guest_name}\x2da.scope

Most part of blkio.throttle.* are not valorized, only blkio.throttle.io_service_bytes and blkio.throttle.io_serviced.

I've the same situation on a Centos 7.2, with much more newer libvirtd and qemu-kvm. So i can suppose that only matters host's kernel version. The cgoup approach is used also by others interesting project. Take a look, for example, at https://github.com/firehol/netdata/

P.S.

Systemd systems use different names, see https://libvirt.org/cgroups.html

Thanks.