naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
154 stars 63 forks source link

notification_interval in escalation is not used #303

Open lgrn opened 6 years ago

lgrn commented 6 years ago

The service now has two overlapping escalations, each with unique contacts, as well as its own service-specific contact and notification_interval. Now make the service go CRITICAL.

Expected behavior: Since escalations are active, service settings in regards to contacts and notification intervals should presumably be overridden by what the escalation(s) dictate. The contacts should get notifications at separate intervals, decided by what escalation they belong to (10 or 15 minutes). The service contact and service notification interval should not be used.

What happens instead: Contacts are correctly taken from the escalations (the service contact is not notified), but the notification_interval is taken from the service, and applied for both of these contacts, instead of the notification_interval specified in the escalation.

Log example, where "CONTACT1" and "CONTACT2" are from the two escalations respectively. Service notifications are sent and logged with a 600 second interval (5 minutes, from the service object).

 [1542029872] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542029872] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542030472] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542030472] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031072] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031072] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031672] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542031672] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032272] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032272] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.010ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032872] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542032872] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.012ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542033472] SERVICE NOTIFICATION: CONTACT1;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.020ms, lost 0% :: 1.3.3.7: rta nan, lost 100%
 [1542033472] SERVICE NOTIFICATION: CONTACT2;apanarenapa;ping;CRITICAL;service-notify;CRITICAL - 127.0.0.1: rta 0.020ms, lost 0% :: 1.3.3.7: rta nan, lost 100%

Escalations:

define serviceescalation{
     host_name                      apanarenapa
     service_description            ping
     contacts                       CONTACT2
     first_notification             1
     last_notification              0
     notification_interval          10
     escalation_options             c,r,u,w
     }

 define serviceescalation{
     host_name                      apanarenapa
     service_description            ping
     contacts                       CONTACT1
     first_notification             1
     last_notification              0
     notification_interval          15
     escalation_options             c,r,u,w
     }

Service object:

define service{
     use                            default-service
     host_name                      apanarenapa
     service_description            ping
     check_command                  check_ping!500,90%!800,100%
     check_interval                 1
     notification_interval          5
     contacts                       never_used
     }

Template:

 define service{
     is_volatile                    0
     max_check_attempts             3
     check_interval                 5
     retry_interval                 1
     active_checks_enabled          1
     passive_checks_enabled         1
     check_period                   24x7
     parallelize_check              0
     obsess                         0
     check_freshness                0
     event_handler_enabled          1
     flap_detection_enabled         1
     process_perf_data              1
     retain_status_information      1
     retain_nonstatus_information   1
     notification_interval          0
     notification_period            24x7
     notification_options           c,f,r,s,u,w
     notifications_enabled          1
     contacts                       never_used
     register                       0
     name                           default-service
     }

OP5 Jira: https://jira.op5.com/browse/MON-11356

vpber commented 3 years ago

This is somewhat by design, from the Naemon documentation https://www.naemon.org/documentation/usersguide/objectdefinitions.html#serviceescalation: Note: If multiple escalation entries for a host overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used.

So actually the expected behavior is: Since escalations are active, service settings in regards to contacts and notification intervals should presumably be overridden by what the escalation(s) dictate. The contacts should get notifications at the lowest defined escalation interval (10 min), regardless of what escalation they belong to (10 or 15 minutes) because they are overlapping. The service contact and service notification interval should not be used.

The error in your case is that Naemon does not use the lowest defined escalation interval (10 min) but uses the "original" service notification interval (5 min) instead.

Also I guess the documentation should say "multiple escalation entries for a service" instead of "host" here.