tcpcloud / Zabbix-Template-Linux-Collectd_libvirt

A Zabbix template for monitoring libvirt stats over collectd
12 stars 10 forks source link

discovery not detecting everything. #1

Closed ghost closed 10 years ago

ghost commented 10 years ago

Hello.

First of all, thanks for putting this to github so i'm able to play with it.

Here my problem. I installed everything as you suggested but discovery doesn't seem to work correctly. Here is what i get:

zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.cpu.discovery { "data":[ { "{#NAME}":"one-frontend-virt_cpu_total"}, ] } zabbix ~ #

But there are 4 more VMs on this host. Do you have an idea whats going wrong?

My system: libvirt-1.2.6, collectd-5.4.1 and zabbix-2.2.5

ghost commented 10 years ago

Hello.

I was able to get around the "only one vm discovered" problem. There was a wrong filter in action. But there is still no data collected by the perl script you provided. Do i need specific filters.conf and thresholds.conf?

cheers t.

ghost commented 10 years ago

... a litte more output:

zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.cpu.discovery { "data":[ { "{#NAME}":"one-frontend-virt_cpu_total"}, { "{#NAME}":"one-node1-virt_cpu_total"}, { "{#NAME}":"one-node2-virt_cpu_total"}, { "{#NAME}":"windows7-virt_cpu_total"}, ] } zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.cpu["one-frontend-virt_cpu_total"]

This command gets nothing back. On Zabbix Agent Side (where collectd is running) i get:

Invalid id "one-frontend-virt_cpu_total". ERROR: Command failed!

What is the correct id?

cheers t.

czhujer commented 10 years ago

Hi Thomas, thanks for the praise.

So, discovery already works fine? If not, send me please "virsh list".. (collectd return only active guests records).

If items are problem (don't work), you can run collect-libvirt-handler.pl manually.

ghost commented 10 years ago

Hello Patrik.

Thanks for your answer. Discovery works fine, yes. The collect-libvirt-handler.pl script runs without problems:

hn2 ~ # /var/lib/zabbix/scripts/collect-libvirt-handler.pl Use of uninitialized value $val in string eq at /var/lib/zabbix/scripts/collect-libvirt-handler.pl line 15. { "data":[ { "{#NAME}":"one-frontend-libvirt-disk_octets-vda"}, { "{#NAME}":"one-frontend-libvirt-disk_ops-vda"}, { "{#NAME}":"one-frontend-libvirt-if_dropped-vnet0"}, { "{#NAME}":"one-frontend-libvirt-if_errors-vnet0"}, { "{#NAME}":"one-frontend-libvirt-if_octets-vnet0"}, { ... and so on.

How can i run collect-libvirt-handler.pl manually?

cheers t.

czhujer commented 10 years ago

Ok, I see.

If you can debug script with specificit item, run as it's defined in /configs/zabbix-collectd.conf .. For Example: sudo /etc/zabbix/scripts/collectd-libvirt/collect-libvirt-handler.pl /var/run/collectd-unixsock GETVAL one-frontend-if-vnet0 NET-PACKETS-RX

or what listed script in discovery mode (collect-libvirt-handler.pl /var/run/collectd-unixsock LISTVAL LIBVIRT-NET)

i am not sure now how script transforms names form "libvirt list" to "zabbix item names" :)

ghost commented 10 years ago

Hello Patrik.

That's exactly my problem. I get nothing back from that command:

hn2 ~ # /var/lib/zabbix/scripts/collect-libvirt-handler.pl /var/run/collectd/collectd-unixsock GETVAL one-frontend-if-vnet0 NET-PACKETS-RX Invalid id "one-frontend-if-vnet0". ERROR: Command failed!

hn2 ~ # /var/lib/zabbix/scripts/collect-libvirt-handler.pl /var/run/collectd/collectd-unixsock GETVAL one-frontend-libvirt-if_packets-vnet0 NET-PACKETS-RX Invalid id "one-frontend-libvirt-if_packets-vnet0". ERROR: Command failed! hn2 ~ #

It always says "Invalid id".

thanks again. t.

czhujer commented 10 years ago

And discovery command says what? (collect-libvirt-handler.pl /var/run/collectd-unixsock LISTVAL LIBVIRT-NET)

czhujer commented 10 years ago

I think, problem is in fact: script has expected libvirt's names in (default) format "instance-000001a"... originally it was for openstack cloud solution :)

ghost commented 10 years ago

I see. So the perl script has to be adjusted. Are you in the mood to do so?

czhujer commented 10 years ago

It's great :+1:

Why me..? You can change "if/elseif $val =~" part (after line 19) according to your intentions :) Or if you mean to edit the script to pass the name of the instance... it's not :)

ghost commented 10 years ago

I see. But i have a hard time to find the correct regular expression for this line:

if( $val =~ /^instance-[0-9a-z]{8,8}-virt_cpu_total/ ){

Do you have an idea what the regular expression for "one-frontend-libvirt-if_octets-vnet0" would be? If not i find out for myself. Thanks again Patrik.

cheers t.

czhujer commented 10 years ago

Just use manual or search http://stackoverflow.com/ for example ...

you have to replace part "instance-[0-9a-z]{8,8}" for pattern/RE for your libvirt guest name's...eg. "(one-frontend|one-node[0/9])" Or you can use command "print" for debug variables and their matching :)

ghost commented 10 years ago

Hello Patrik.

Yeah. Tried that. But without success till now. Can you give me a complete example of the manual commands? Right now it looks like this:

hn2 scripts # /var/lib/zabbix/scripts/collect-libvirt-handler.pl /var/run/collectd/collectd-unixsock LISTVAL LIBVIRT-DISK { "data":[ { "{#NAME}":"serve.lordcritical-disk-vda"}, { "{#NAME}":"windows7-disk-vda"}, ] } hn2 scripts # /var/lib/zabbix/scripts/collect-libvirt-handler.pl /var/run/collectd/collectd-unixsock GETVAL windows7-vda-disk OPS-READ DEBUG: command: GETVAL windows7-vda-disk val: windows7-vda-disk Invalid id "windows7-vda-disk". ERROR: Command failed! hn2 scripts #

hn2 scripts # ls -l /var/lib/collectd/csv/windows7/libvirt/ total 624 -rw-r--r-- 1 collectd collectd 11376 Oct 11 23:59 disk_octets-vda-2014-10-11 -rw-r--r-- 1 collectd collectd 43011 Oct 12 19:20 disk_octets-vda-2014-10-12 -rw-r--r-- 1 collectd collectd 8833 Oct 13 17:43 disk_octets-vda-2014-10-13 -rw-r--r-- 1 collectd collectd 1518 Oct 14 09:55 disk_octets-vda-2014-10-14 -rw-r--r-- 1 collectd collectd 8613 Oct 11 23:59 disk_ops-vda-2014-10-11 -rw-r--r-- 1 collectd collectd 32553 Oct 12 19:20 disk_ops-vda-2014-10-12 -rw-r--r-- 1 collectd collectd 6649 Oct 13 17:43 disk_ops-vda-2014-10-13 -rw-r--r-- 1 collectd collectd 1115 Oct 14 09:55 disk_ops-vda-2014-10-14

Thanks again.

t.

czhujer commented 10 years ago

I think, your regular expressions don't work.. You have to modify $val like that as $command was like a "one-frontend-libvirt-disk_ops-vda".

My script is modified perl example script from collectd package (cussh.pl). That script runs similar as my and you see "original Id"..

Docs: https://github.com/collectd/collectd/blob/master/contrib/cussh.pl https://collectd.org/wiki/index.php/Plain_text_protocol

ghost commented 10 years ago

Am 14.10.14 10:57, schrieb Patrik Majer:

I think, your regular expressions don't work.. You have to modify $val like that as $command was like a "one-frontend-libvirt-disk_ops-vda".

Hello Patrik.

I think i'm too stupid. But thanks so far. I'll keep trying.

hn2 scripts # grep windows7 collect-libvirt-handler.pl if( $val =~ /(windows7-libvirt-virt_vcpu-0)-virt_cpu_total/ ){ elsif($val =~ /(windows7-libvirt-virt_vcpu-0)-disk-/ and $val_type =~ /^OPS/){ elsif($val =~ /(windows7-libvirt-virt_vcpu-0)-disk-/ and $val_type =~ /^OCT/){ elsif($val =~ /(windows7-libvirt-virt_vcpu-0)-if-/ and $val_type =~ /^NET-PACKETS/){ elsif($val =~ /(windows7-libvirt-virt_vcpu-0)-if-/ and $val_type =~ /^NET-OCTETS/){ if( $line[0] =~ /(windows7-libvirt-virt_vcpu-0)\/libvirt\/virt_cpu_total/ ){ elsif($line[0] =~ /(windows7-libvirt-virt_vcpu-0)\/libvirt\/disk_ops/){ elsif($line[0] =~ /(windows7-libvirt-virt_vcpu-0)\/libvirt\/disk_octets/){ elsif($line[0] =~ /(windows7-libvirt-virt_vcpu-0)\/libvirt\/if_packets/){ elsif($line[0] =~ /(windows7-libvirt-virt_vcpu-0)\/libvirt\/if_octets/){

hn2 scripts # /var/lib/zabbix/scripts/collect-libvirt-handler.pl /var/run/collectd/collectd-unixsock GETVAL windows7-libvirt-virt_vcpu-0 DEBUG: command: GETVAL windows7-libvirt-virt_vcpu-0 val: windows7-libvirt-virt_vcpu-0 Invalid id "windows7-libvirt-virt_vcpu-0". ERROR: Command failed! hn2 scripts #

My script is modified perl example script from collectd package (cussh.pl). That script runs similar as my and you see "original Id"..

Docs: https://github.com/collectd/collectd/blob/master/contrib/cussh.pl https://collectd.org/wiki/index.php/Plain_text_protocol


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1#issuecomment-59009892

czhujer commented 10 years ago

So, if you can debug CPU usage, more precisely "virt_cpu_total" (anothers CPU items not supported), changing first "if" (line 19). Others elseif are for disk a network items...

Try: if( $val =~ /^windows7-virt_cpu_total/ ) (this is item name in zabbix) and $val should be (this it passes into collectd unixsocket interface): windows7/libvirt/virt_cpu_total (results of line 22-23)

Network and disk stats/variables is more complex :)

czhujer commented 10 years ago

hi, Were you able to modify the code for your case?

ghost commented 10 years ago

Am 20.10.14 10:44, schrieb Patrik Majer:

hi, Were you able to modify the code for your case?

Hi Patrick.

No. Not at all. I ended always up with "invalid id". I'm a little bit sad too. :-) I really like the idea of getting statistics this way.

cheers t.


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1#issuecomment-59706404

czhujer commented 10 years ago

So, i wrote document how it's works and how can you debug this... https://gist.github.com/czhujer/e4668e8e1459b02fcdeb#file-howto-debug-zabbix-template-linux-collectd_libvirt you can try it :)

ghost commented 10 years ago

Am 22.10.14 15:53, schrieb Patrik Majer:

So, i wrote document how it's works and how can you debug this... https://gist.github.com/czhujer/e4668e8e1459b02fcdeb#file-howto-debug-zabbix-template-linux-collectd_libvirt you can try it :)

Great. Thanks for that. I will look into it soonish.

cheers t.

ghost commented 10 years ago

Am 22.10.14 15:53, schrieb Patrik Majer:

So, i wrote document how it's works and how can you debug this... https://gist.github.com/czhujer/e4668e8e1459b02fcdeb#file-howto-debug-zabbix-template-linux-collectd_libvirt you can try it :)

I don't understand why it works now.

zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.disk-ops-read[serve.lordcritical/libvirt/disk_ops-vda] DEBUG: command: GETVAL serve.lordcritical/libvirt/disk_ops-vda val: serve.lordcritical/libvirt/disk_ops-vda read: 0 write: 0.8333304 zabbix ~ #

It seems the regular expression in the perl script does not need adjusting at all.

cheers t.


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1#issuecomment-60087695

ghost commented 10 years ago

Am 22.10.14 17:14, schrieb Thomas Stein:

Am 22.10.14 15:53, schrieb Patrik Majer:

So, i wrote document how it's works and how can you debug this... https://gist.github.com/czhujer/e4668e8e1459b02fcdeb#file-howto-debug-zabbix-template-linux-collectd_libvirt you can try it :)

I don't understand why it works now.

zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.disk-ops-read[serve.lordcritical/libvirt/disk_ops-vda] DEBUG: command: GETVAL serve.lordcritical/libvirt/disk_ops-vda val: serve.lordcritical/libvirt/disk_ops-vda read: 0 write: 0.8333304 zabbix ~ #

It seems the regular expression in the perl script does not need adjusting at all.

Ah. With the minus character it does no work:

zabbix ~ # zabbix_get -s domainname -k collectd-libvirt.disk-ops-read[serve.lordcritical-libvirt-disk_ops-vda] DEBUG: command: GETVAL serve.lordcritical-libvirt-disk_ops-vda val: serve.lordcritical-libvirt-disk_ops-vda zabbix ~ #

So we need a regex to replace the - with a /.

cheers t.

cheers t.


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1#issuecomment-60087695

ghost commented 10 years ago

Hi.

A quick update. I managed to adjust the second regex but still no luck with the first one. Zabbix-get with the slashes works:

zabbix ~ # zabbix_get -s domain -k collectd-libvirt.disk-ops-write[windows7/libvirt/disk_ops-vda] 0.01666617 zabbix ~ #

The interesting part in the collect perl script looks like this:

        elsif($val =~ /^windows7-disk-/ and $val_type =~ /^OPS/){
            @vals = split(/-/, $val);
                $val = $vals[0] . "-" . $vals[1] . "/libvirt/" . $vals[2] . "_ops-" .$vals[3]

        }

So this is wrong i guess. I'll keep trying.

ghost commented 10 years ago

Now it gets weird. I created a VM with the name instance-00000935. And guess what. It fails too with the original collect perl script.

hn2 scripts # ./collect-libvirt-handler.pl.orig /var/run/collectd/collectd-unixsock GETVAL instance-00000935-libvirt-disk_ops-vda OPS-READ DEBUG: command: GETVAL instance-00000935-libvirt-disk_ops-vda val: instance-00000935-libvirt-disk_ops-vda Invalid id "instance-00000935-libvirt-disk_ops-vda". ERROR: Command failed!

And with slashes it works. Now i'm puzzled.

hn2 scripts # ./collect-libvirt-handler.pl.orig /var/run/collectd/collectd-unixsock GETVAL instance-00000935/libvirt/disk_ops-vda OPS-WRITE DEBUG: command: GETVAL instance-00000935/libvirt/disk_ops-vda val: instance-00000935/libvirt/disk_ops-vda 0.3500069

czhujer commented 10 years ago

Hi, so, this is progress :)

First, this is right workflow.. bud strang item name .. """ zabbix ~ # zabbix_get -s domain -k collectd-libvirt.disk-ops-write[windows7/libvirt/disk_ops-vda] 0.01666617 """" zabbix allows return value as "one" number (float or integer, as you see in zabbix server item)... :)

so, must match the (anticipated) entry - zabbix items names, and regullar expressions for handling item name and values.. (collectd returns disk stats for read and write in one run, therefore "next" changing). I have to describe this in "debug" docs, or template ...

So, first is imporant debug cpu_total... this is simplest items :+1:

czhujer commented 10 years ago

there is example results of CPU discovery ...

https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/blob/master/docs-examples/example-discovery-result.md

you should be see: .. "{#NAME}":"windows7-virt_cpu_total"},

"{#NAME}":"serve.lordcritical-virt_cpu_total"}, ..

ghost commented 10 years ago

On Thursday 23 October 2014 00:45:46 Patrik Majer wrote:

there is example results of CPU discovery ...

https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/blob/maste r/docs-examples/example-discovery-result.md

you should be see: .. "{#NAME}":"windows7-virt_cpu_total"},

"{#NAME}":"serve.lordcritical-virt_cpu_total"}, ..

Yes this works as expected. But not without slashes. See:

hn2 scripts # ./collect-libvirt-handler.pl.orig /var/run/collectd/collectd- unixsock GETVAL serve.lordcritical-libvirt-virt_cpu_total
DEBUG: command: GETVAL serve.lordcritical-libvirt-virt_cpu_total val: serve.lordcritical-libvirt-virt_cpu_total Invalid id "serve.lordcritical-libvirt-virt_cpu_total". ERROR: Command failed!

hn2 scripts # ./collect-libvirt-handler.pl.orig /var/run/collectd/collectd- unixsock GETVAL serve.lordcritical/libvirt/virt_cpu_total DEBUG: command: GETVAL serve.lordcritical/libvirt/virt_cpu_total val: serve.lordcritical/libvirt/virt_cpu_total value: 19666490 hn2 scripts #

Somehow the replacements of the "-" with "/" does not work. Even with a vm named instance-00009101.

cheers t.


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1# issuecomment-60203360

czhujer commented 10 years ago

I see. but in zabbix item will be not exist "libvirt"...or this name is in discovery results?

So, there is rewrited script.. please try it.. (but now is tested only virt_cpu_total ..) https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/blob/devel/scripts/collect-libvirt-handler.pl

diff: https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/commit/93df0f4e4bc4599793dfc9432602e2e6260ca55c

ghost commented 10 years ago

On Thursday 23 October 2014 01:38:49 Patrik Majer wrote:

I see. but in zabbix item will be not exist "libvirt"...or this name is in discovery results?

So, there is rewrited script.. please try it.. (but now is tested only virt_cpu_total ..) https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/blob/deve l/scripts/collect-libvirt-handler.pl

Now i get "0" back if i use "-"

hn2 scripts # ./collect-libvirt-handler.pl.new /var/run/collectd/collectd- unixsock GETVAL serve.lordcritical/libvirt/virt_cpu_total 18833140

hn2 scripts # ./collect-libvirt-handler.pl.new /var/run/collectd/collectd- unixsock GETVAL serve.lordcritical-libvirt-virt_cpu_total 0 hn2 scripts #

But obviousley the ID is recognised now. Thats good although i don't understart the numbers. :-)

The CSV entry looks like this:

hn2 scripts # cat /var/lib/collectd/csv/serve.lordcritical/libvirt/virt_cpu_total-2014-10-23 epoch,value 1414015242.994,150900000000 1414015302.994,151990000000 1414015362.992,153130000000 1414015422.993,154230000000

cheers t.

diff: https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/commit/93 df0f4e4bc4599793dfc9432602e2e6260ca55c


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1# issuecomment-60208574

czhujer commented 10 years ago

ok, great :)

so, this is interesting .. :)) [root@test collectd-libvirt]# ./collect-libvirt-handler.pl /var/run/collectd-unixsock GETVAL instance-00000841/libvirt/virt_cpu_total 1000000 [root@test collectd-libvirt]# ./collect-libvirt-handler.pl /var/run/collectd-unixsock GETVAL instance-00000841-virt_cpu_total 1000000

so, collectd returns on unixsocget "diferencial numbers" and in csv is counters... i'm not sure what is this number.. this is question on collectd authors :) (but trigger > 0 solving check if guest is alive/running ..)

so, this is fix do disks stats: https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/commit/dba644cd4a028eec05648cb5e2d938a053a7588a

czhujer commented 10 years ago

plus this is fix for interfaces stats: https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/commit/39950a71216a617163f3541aade350fbcf9b136d

ghost commented 10 years ago

On Thursday 23 October 2014 02:55:01 Patrik Majer wrote:

plus this is fix for interfaces stats: https://github.com/czhujer/Zabbix-Template-Linux-Collectd_libvirt/commit/399 50a71216a617163f3541aade350fbcf9b136d

Thanks. Works fine. Let aside the issue with the "-" Character. This still is broken. Are you planning to fix this in the near future? Thanks again so far.

cheers t.


Reply to this email directly or view it on GitHub: https://github.com/tcpcloud/Zabbix-Template-Linux-Collectd_libvirt/issues/1# issuecomment-60216654

czhujer commented 10 years ago

No problem. You're welcome :)

so, i dont sure where is problem.. Can you send me full results of discovery items/commands? And put some infos about your system/stack? (uname -a, versions of packages etc)

czhujer commented 10 years ago

ideally open new issue...