shinken-solutions / shinken

Flexible and scalable monitoring framework
http://www.shinken-monitoring.org
GNU Affero General Public License v3.0
1.13k stars 336 forks source link

Escalations don't work (git master) #1457

Closed N-Mi closed 9 years ago

N-Mi commented 9 years ago

On Debian Wheezy, running last git master.

Here is a minimal config (adapted from #1455) , that should first notify the user "me", and then the user "me2".

interval_length=1

retention_update_interval=60
max_service_check_spread=5
max_host_check_spread=5
service_check_timeout=60
timeout_exit_status=2
flap_history=20
max_plugins_output_length=65536
enable_problem_impacts_states_change=1
disable_old_nagios_parameters_whining=0
enable_environment_macros=0
log_initial_states=0
no_event_handlers_during_downtimes=1
pack_distribution_file=/var/lib/shinken/pack_distribution.dat
workdir=/var/lib/shinken/
lock_file=/var/run/shinken/arbiterd.pid
local_log=/var/log/shinken/arbiterd.log
shinken_user=shinken
shinken_group=shinken
modules_dir=/var/lib/shinken/modules
daemon_enabled=1
use_ssl=0
ca_cert=/etc/shinken/certs/ca.pem
server_cert=/etc/shinken/certs/server.cert
server_key=/etc/shinken/certs/server.key
hard_ssl_name_check=0
http_backend=auto

define command {
    command_name check_host
    command_line /usr/local/bin/check_host.sh
}

define command {
    command_name notify
    command_line /usr/local/bin/notify.sh $CONTACTEMAIL$
}

define timeperiod {
    timeperiod_name H24
    sunday          00:00-24:00
    monday          00:00-24:00
    tuesday         00:00-24:00
    wednesday       00:00-24:00
    thursday        00:00-24:00
    friday          00:00-24:00
    saturday        00:00-24:00
}

define contact {
    contact_name me
    alias me
    email me@domain.tld
    password me
    address1 0166666666
    is_admin 1
    can_submit_commands 1
    host_notification_period H24
    host_notification_options d,u,r,f
    host_notification_commands notify
    service_notification_period H24
    service_notification_options w,u,c,r,f
    service_notification_commands notify
    min_business_impact 2
}

define contact {
    contact_name me2
    alias me2
    email me2@domain.tld
    password me2
    address1 0166666666
    is_admin 1
    can_submit_commands 1
    host_notification_period H24
    host_notification_options d,u,r,f
    host_notification_commands notify
    service_notification_period H24
    service_notification_options w,u,c,r,f
    service_notification_commands notify
    min_business_impact 2
}

define contactgroup{
    contactgroup_name       to_me2
    alias                   Me2
    members                 me2
}

define host {
    host_name google
    address www.goolge.fr
    contacts me
    active_checks_enabled 1
    notifications_enabled 1
    check_interval 10
    retry_interval 1
    max_check_attempts 3
    notification_interval 20
    check_period H24
    notification_period H24
    flap_detection_enabled 0
    notification_options d,u,r,f
    check_command check_host
    business_impact 3
    escalations esc1
}

define escalation {
        escalation_name                 esc1
        contact_groups                  to_me2
        first_notification              60
        last_notification               0
        notification_interval           10
        escalation_options              d,u,r
        #register                        1
}

# Conf pour tout ce qui touche aux modules, aux realms
# et aux autres daemons de shinken où, dans ce cas là,
# on s'en tient à la conf par défaut après l'installation
# de shinken.
cfg_dir=modules
cfg_dir=arbiters
cfg_dir=schedulers
cfg_dir=pollers
cfg_dir=reactionners
cfg_dir=brokers
cfg_dir=receivers
cfg_dir=realms

What I observe is only contact "me" is receiving notifications.

Here is schedulerd.log :

[1421257022] WARNING: [Shinken] Received a SIGNAL 15
[1421257022] INFO: [Shinken] [scheduler] Stopping all network connections
[1421257023] INFO: [Shinken] Trying to initialize additional groups for the daemon
[1421257023] INFO: [Shinken] Stale pidfile exists at invalid literal for int() with base 10: '' (/var/run/shinken/schedulerd.pid). Reusing it.
[1421257023] INFO: [Shinken] Opening HTTP socket at http://0.0.0.0:7768
[1421257023] INFO: [Shinken] Initializing a wsgiref backend with 8 threads
[1421257023] INFO: [Shinken] Using the local log file '/var/log/shinken/schedulerd.log'
[1421257023] INFO: [Shinken] Printing stored debug messages prior to our daemonization
[1421257023] INFO: [Shinken] Successfully changed to workdir: /var/run/shinken
[1421257023] INFO: [Shinken] Opening pid file: /var/run/shinken/schedulerd.pid
[1421257023] INFO: [Shinken] Redirecting stdout and stderr as necessary..
[1421257023] INFO: [Shinken] We are now fully daemonized :) pid=27644
[1421257023] INFO: [Shinken] Starting HTTP daemon
[1421257023] INFO: [Shinken] Modules directory: /var/lib/shinken/modules
[1421257023] INFO: [Shinken] Using a 8 http pool size
[1421257023] INFO: [Shinken] Modules directory: /var/lib/shinken/modules
[1421257023] INFO: [Shinken] [scheduler] General interface is at: http://0.0.0.0:7768
[1421257023] INFO: [Shinken] Waiting for initial configuration
[1421257024] INFO: [Shinken] New configuration received
[1421257024] INFO: [Shinken] I correctly loaded the modules: []
[1421257024] INFO: [Shinken] Loading configuration.
[1421257024] INFO: [Shinken] New configuration loaded
[1421257024] INFO: [Shinken] [scheduler-master] First scheduling launched
[1421257024] INFO: [Shinken] [scheduler-master] First scheduling done
[1421257035] INFO: [Shinken] A new broker just connected : broker-master
[1421257035] INFO: [Shinken] [scheduler-master] Created 13 initial Broks for broker broker-master
[1421257035] INFO: [Shinken] Waiting for initial configuration
[1421257035] INFO: [Shinken] New configuration received
[1421257035] INFO: [Shinken] I correctly loaded the modules: []
[1421257035] INFO: [Shinken] Loading configuration.
[1421257035] INFO: [Shinken] New configuration loaded
[1421257035] INFO: [Shinken] [scheduler-master] First scheduling launched
[1421257035] INFO: [Shinken] [scheduler-master] First scheduling done
[1421257036] INFO: [Shinken] A new broker just connected : broker-master
[1421257036] INFO: [Shinken] [scheduler-master] Created 13 initial Broks for broker broker-master
[1421257040] HOST ALERT: google;DOWN;SOFT;1;exit code is 2
[1421257044] HOST ALERT: google;DOWN;SOFT;2;exit code is 2
[1421257048] HOST ALERT: google;DOWN;HARD;3;exit code is 2
[1421257048] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257068] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257088] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257108] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257128] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257148] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257168] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257188] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257208] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257228] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257248] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257268] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257288] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2
[1421257308] HOST NOTIFICATION: me;google;DOWN;notify;exit code is 2

Is this a bug or am I doing something wrong in my conf ?

Seb-Solon commented 9 years ago

Hi,

I notice that the conf says :

        first_notification              60
        last_notification               0

Which relate to the notification number not the time. Are you sure this is what you want?

In your case the escalation triggers at the 60th attempt.

Ref : https://shinken.readthedocs.org/en/latest/08_configobjects/hostescalation.html

N-Mi commented 9 years ago

My bad, I misread the documentation and mixed "first_notification" with "first_notification_time" in my mind.

I took a look at the log and can see that escalation worked like described (= 20 times later than I expected from what I understood)