patrickpr / trapdirector

Icingaweb2 module for receiving and handling snmp traps
GNU General Public License v3.0
53 stars 15 forks source link

Add trap handler select by hostgroup fail #25

Closed fromanm1 closed 4 years ago

fromanm1 commented 4 years ago

Describe the bug

when try to create a new handler , but selected by hostgroup, it fails with "no host filter"

patrickpr commented 4 years ago

Related to code rewriting too ! I'll take care of this.

patrickpr commented 4 years ago

Corrected. Thanks for the report

fromanm1 commented 4 years ago

now it detect host groups, but only shows ping4 server or no service at all

patrickpr commented 4 years ago

The services displayed are only the services which are common to all members in the group (same service name). If you select a hostgroup, the rule must be able to send status to service of any member of this group : this is why so you must have common services. Is this what you are seeing or is there missing common services ?

fromanm1 commented 4 years ago

confirmed, it doesn't work, only shows ping4 or no service at all.

I made a dummy host copyng a random host and change only the name, so I have 2 hosts exactly the same but different name and the module doesn't detect the services.

patrickpr commented 4 years ago

This is weird, I can't reproduce the issue. are you using mysql or pgsql ?

fromanm1 commented 4 years ago

i'm using mysql, let me make a video of it.

fromanm1 commented 4 years ago

here https://www.youtube.com/watch?v=GXKkzu9LLgU&feature=youtu.be

patrickpr commented 4 years ago

Ok to check :

1) get mysql prompt in ido db : mysql -u user -p icinga DB
If you are not sure about icingaDB, got to the trapdirector configuration get IDO database name and find it in /etc/icingaweb2/resources.ini.

2) Get the host group id : (using "energy-host" as in your video. SELECT a.name1 AS name, a.object_id AS id, b.alias AS display_name FROM icinga_objects AS a INNER JOIN icinga_hostgroups AS b ON b.hostgroup_object_id=a.object_id WHERE ((a.name1 LIKE '%energy-host%' OR b.alias LIKE '%energy-host%') and a.is_active = 1);

You should have one result : get the id

3) with the id : SELECT s.host_object_id, a.hostgroup_object_id FROM icinga_hostgroup_members AS s INNER JOIN icinga_hostgroups AS a ON s.hostgroup_id=a.hostgroup_id WHERE (a.hostgroup_object_id=<PUT ID HERE>);

You should have the hosts in the group with the id associated with the host (host_object_id)

4) for each host, list active services with : SELECT s.display_name AS name, s.service_object_id AS id, a.is_active, a.name2 FROM icinga_services AS s INNER JOIN icinga_objects AS a ON s.service_object_id=a.object_id WHERE (s.host_object_id=<PUT ID HERE> AND a.is_active = 1);

The script compares the field 'name_2' on each host and take common ones.

Please send me the output. (and yes, I will write a SQL debug mode some days !)

fromanm1 commented 4 years ago

1) MariaDB [icinga]> SELECT a.name1 AS name, a.object_id AS id, b.alias AS display_name FROM icinga_objects AS a INNER JOIN icinga_hostgroups AS b ON b.hostgroup_object_id=a.object_id WHERE ((a.name1 LIKE '%energy-host%' OR b.alias LIKE '%energy-host%') and a.is_active = 1); +-------------+-----+----------------------+ | name | id | display_name | +-------------+-----+----------------------+ | energy-host | 658 | Energy meter servers | +-------------+-----+----------------------+ 1 row in set (0.00 sec)

2) MariaDB [icinga]> SELECT s.host_object_id, a.hostgroup_object_id FROM icinga_hostgroup_members AS s INNER JOIN icinga_hostgroups AS a ON s.hostgroup_id=a.hostgroup_id WHERE (a.hostgroup_object_id=658); +----------------+---------------------+ | host_object_id | hostgroup_object_id | +----------------+---------------------+ | 599 | 658 | | 635 | 658 | | 660 | 658 | +----------------+---------------------+ 3 rows in set (0.00 sec)

3) the first host MariaDB [icinga]> SELECT s.display_name AS name, s.service_object_id AS id, a.is_active, a.name2 FROM icinga_services AS s INNER JOIN icinga_objects AS a ON s.service_object_id=a.object_id WHERE (s.host_object_id=599 AND a.is_active = 1); +-----------------------------+------+-----------+-----------------------------+ | name | id | is_active | name2 | +-----------------------------+------+-----------+-----------------------------+ | ping4 | 600 | 1 | ping4 | | energy-meter | 601 | 1 | energy-meter | | voltage-meter | 602 | 1 | voltage-meter | | current-meter | 604 | 1 | current-meter | | phase-angle-meter | 607 | 1 | phase-angle-meter | | power-factor-meter | 608 | 1 | power-factor-meter | | frequency-meter | 610 | 1 | frequency-meter | | total-active-energy-meter | 615 | 1 | total-active-energy-meter | | total-reactive-energy-meter | 616 | 1 | total-reactive-energy-meter | | power-meter | 652 | 1 | power-meter | +-----------------------------+------+-----------+-----------------------------+ 10 rows in set (0.01 sec)

dummy host) MariaDB [icinga]> SELECT s.host_object_id, a.hostgroup_object_id FROM icinga_hostgroup_members AS s INNER JOIN icinga_hostgroups AS a ON s.hostgroup_id=a.hostgroup_id WHERE (a.hostgroup_object_id=660); Empty set (0.00 sec)

you were right, but why? the host is a copy of the other, same services, same everything,

patrickpr commented 4 years ago

The SQL request for dummy host should be :

SELECT s.display_name AS name, s.service_object_id AS id, a.is_active, a.name2 FROM icinga_services AS s INNER JOIN icinga_objects AS a ON s.service_object_id=a.object_id WHERE (s.host_object_id=660 AND a.is_active = 1);

fromanm1 commented 4 years ago

no, 660 is other host with other services, same host group, same metrics, but not the same service names because the host 660 retrieves the data using rtu over tcp, the other 2 uses snmp.

I dont have another other of this machines right now to add, I'll go shopping today and when I have it I'll let you know.

anyways, can I have your email or whatsapp to discuss some problems and requeriments?

robdevops commented 4 years ago

The YouTube video shows the same host twice, and the final query was incorrectly based on the query from the previous step.

pwyss0 commented 4 years ago

patrickpr, do you already know roughly when the fix for the bug "#25 handler failes when hostgroup is selected" will be released?

patrickpr commented 4 years ago

@pwyss0 : The first fix (here) is in latest releases. For the problem of services not showing up : I am not able to reproduce the problem, so I can't fix it.... Do you have the same issue ?

patrickpr commented 4 years ago

See issue #40 for a bug fix related to this.

patrickpr commented 4 years ago

To all, thanks for your feedback on the following points :

1) Bug on handler creation : fixed with commit f94f6fe and working OK 2) Service selection on hostgroup when adding/editing handler : cannot reproduce, has anybody still got the issue ? 3) Problem related to issue #40 : fixed in 1.0.4b

If everyone is ok with point 1 & 3 and nobody has problems on 2, I will close the case.

Patrick

pwyss0 commented 4 years ago

@patrickpr : In fact, the trap handler does work perfectly fine for individual hosts but not for hostgroups. If hostgroup is selected traps are not shown in Traps/Received and therefore the counter "Has matched" is also not counted up. May I give you an example of my configuration. The hostgroup is called "ENM ILO", the service "cmd_check_trap" and the service template "trapdirector_main_template". In the debug log only the service "ping4" can be found although the service "cmd_check_trap" is selected in the trap handler. Many thanks for your help. Peter

zones.d/director-global/hostgroups.conf object HostGroup "ENM ILO" { display_name = "ENM ILO" assign where host.vars.server_role == "ILO" }

zones.d/director-global/service_apply.conf apply Service "cmd_check_trap" { import "trapdirector_main_template" assign where host.vars.server_role == "ILO" import DirectorOverrideTemplate }

zones.d/director-global/service_templates.conf template Service "trapdirector_main_template" { check_command = "dummy" max_check_attempts = "1" check_interval = 15m retry_interval = 15m check_timeout = 20s enable_notifications = true enable_active_checks = true enable_passive_checks = true enable_event_handler = true enable_perfdata = true command_endpoint = null }

MariaDB [icinga2]> SELECT a.name1 AS name, a.object_id AS id, b.alias AS display_name FROM icinga_objects AS a INNER JOIN icinga_hostgroups AS b ON b.hostgroup_object_id=a.object_id WHERE ((a.name1 LIKE 'ENM ILO' OR b.alias LIKE 'ENM ILO') and a.is_active = 1); +---------+------+--------------+ | name | id | display_name | +---------+------+--------------+ | ENM ILO | 2053 | ENM ILO | +---------+------+--------------+ 1 row in set (0.00 sec)

MariaDB [icinga2]> SELECT s.host_object_id, a.hostgroup_object_id FROM icinga_hostgroup_members AS s INNER JOIN icinga_hostgroups AS a ON s.hostgroup_id=a.hostgroup_id WHERE (a.hostgroup_object_id=2053); +----------------+---------------------+ | host_object_id | hostgroup_object_id | +----------------+---------------------+ | 1547 | 2053 | | 1548 | 2053 | | 1551 | 2053 | | 1552 | 2053 | | 1553 | 2053 | | 1554 | 2053 | . . +----------------+---------------------+ 34 rows in set (0.00 sec)

MariaDB [icinga2]> SELECT s.display_name AS name, s.service_object_id AS id, a.is_active, a.name2 FROM icinga_services AS s INNER JOIN icinga_objects AS a ON s.service_object_id=a.object_id WHERE (s.host_object_id=1547 AND a.is_active = 1); +----------------+------+-----------+----------------+ | name | id | is_active | name2 | +----------------+------+-----------+----------------+ | ping4 | 1550 | 1 | ping4 | | cmd_check_trap | 2021 | 1 | cmd_check_trap | +----------------+------+-----------+----------------+ 2 rows in set (0.00 sec)

MariaDB [icinga2]>

Debug Log: [2020-07-06 15:51:15 +0200] debug/CheckerComponent: Executing check for 'server.xxx.net!ping4' [2020-07-06 15:51:15 +0200] debug/Checkable: Update checkable 'server.xxx.net!ping4' with check interval '60' from last check time at 2020-07-06 15:50:15 +0200 (1.59404e+09) to next check time at 2020-07-06 15:52:14 +0200 (1.59404e+09).

patrickpr commented 4 years ago

Do you have the trapdirector log when a trap who should trigger this hostgroup rule is received ? Also did you try the latest version (1.0.4c) because it could help a lot.

pwyss0 commented 4 years ago

@patrickpr Thanks for the tip. I will have the latest version 1.0.4c installed (my version is still 1.0.3). Furthermore, I found the same error messages in /var/log/messages, which were reported by lazyfrosch on July 6th. This error message is populated in the log only if "hostgroup" is selected.

php: [TrapDirector] [Error]: Connection failed to IDO : invalid data source name php: [TrapDirector] [Warning]: Exception : [TrapDirector] [Error]: Connection failed to IDO : invalid data source name

If I understand it correctly, this issue should be resolved with the last version (1.0.4c), right?

patrickpr commented 4 years ago

Yes : there was a problem with the IDO database setup when icinga API was configured in the module. It has been corrected in 1.0.4c : tell me if it's OK for you with this version.

pwyss0 commented 4 years ago

@patrickpr, I'm very happy to inform you that the hostgroup failure is resolved with the latest version 1.0.4c. Thank you very much for your great support. Peter

patrickpr commented 4 years ago

Closing this issue. If you have related problems, please open a NEW issue as this one has too much subjects mixed.