shinken-solutions / shinken

Flexible and scalable monitoring framework
http://www.shinken-monitoring.org
GNU Affero General Public License v3.0
1.13k stars 336 forks source link

error using additive inheritance #1672

Closed N-Mi closed 9 years ago

N-Mi commented 9 years ago

Hi,

I'm setting a new 2.4 server to replace my old 1.4.2.

I have the following error message when doing a configcheck :

[1436349124] ERROR: [Shinken] [host::server1] The contact group '+' defined on the host 'server1' do not exist
[1436349124] ERROR: [Shinken] [items] In server1 is incorrect ; from /etc/shinken/hosts/prod/server1.cfg:1

Here are interesting config snippets :

define host{
        use                     squeeze, dns
        host_name               server1
    address         1.2.3.4
    alias           My first server

    _fs         /

    _DNSHOSTNAME        $HOSTNAME$
    _DNSEXPECTEDRESULT  $HOSTADDRESS$
}

# Debian Squeeze
define host{
    name        squeeze
    register    0
    use     generic-host
    icon_set    server
    use     linux-snmp,ssh
    _SNMPCOMMUNITY  aaaaaaaaaaaaaaaaaaaaaa
    _MAILTAG    [SYS]
}

define host{
   name           dns
   use            generic-host
   register       0

   _DNSHOSTNAME        $HOSTNAME$
   _DNSEXPECTEDRESULT  $HOSTADDRESS$
}

# Generic host definition template - This is NOT a real host, just a template!
# Most hosts should inherit from this one
define host{
    name                generic-host

    # Checking part
    check_command           check_host_alive
    max_check_attempts      2
    check_interval          5

    # Check every time
    active_checks_enabled       1
    check_period            24x7

    # Notification part
    # One notification each day (1440 = 60min* 24h)
    # every time, and for all 'errors'
    # notify the admins contactgroups by default
    contact_groups          +admins
    notification_interval       1440
    notification_period     24x7
    notification_options        d,u,r,f
    notifications_enabled       1

    # Advanced option. Look at the wiki for more informations
    event_handler_enabled       0
    flap_detection_enabled      1
    process_perf_data       1

    # Maintenance period
    #maintenance_period     workhours

    # Dispatching
    #poller_tag          DMZ
    #realm               All

    # For the WebUI
    #icon_set            server ; can be database, disk, network_service, server

    # This said that it's a template
    register            0
}

The same host and templates definition works fine in 1.4.2 If I remove the "+" in contact_groups paramter in generic-host, the checkconfig works fine. If I add "contact_groups developers" to server1, it works also (and I guess admins and developers will be notified, not tried yet).

So it seems there is a problem when doing additive inheritance to an unset variable.

olivierHa commented 9 years ago

I will try to add a testcase, reproduce and fix it :)

Corbyn commented 9 years ago

You have two "use" attributes in your "squeeze" host definition. Perhaps this contributes somehow to the problem...

define host{ name squeeze register 0 use generic-host icon_set server use linux-snmp,ssh _SNMPCOMMUNITY aaaaaaaaaaaaaaaaaaaaaa _MAILTAG [SYS] }

I have no problem with the same constellation of +contact_groups in our Shinken 2.4 config.

N-Mi commented 9 years ago

I've just modified the squeeze template like this :

# Debian Squeeze
define host{
    name        squeeze
    register    0
    icon_set    server
    use     linux-snmp,ssh,generic-host
    _SNMPCOMMUNITY   aaaaaaaaaaaaaaaaaaaaaa
    _MAILTAG    [SYS]
}

And I still get the error message.

Corbyn commented 9 years ago

I could reproduce this error. It occours if an additive field (+attribute) is referenced by more than one template. So I guess one of your templates "linux-snmp" or "ssh" also has a "use" attribute that contains "generic-host"?

According to Shinken docs the first value found should be used:

Important

If you use a field twice using several templates, the value of the field will be the first one found! In the example above, fields values in all-servers won’t we be replaced. Be careful with overlaping field!

But seems like this is a case of an overlapping field, where the value gets lost somehow because of additive inheritance chaining. Anyway, I think the problem is eliminated if you make sure that the "generic-host" template is used only once in the chain...

N-Mi commented 9 years ago

Indeed, the "linux-snmp" pack has a "use generic-host" directive in its template.

I removed "generic-host" in the squeeze template, and now it works.

My guess is it develops the contact_groups to "+admins,+admins", then remove the second because it's duplicate and keeps "+admins,+" instead of "+admins", which leads to the checkconfig error.

(it's just a guess of what happens, I didn't had a look in the code to validate this)

olivierHa commented 9 years ago

Cool ! I manage to reproduce it. I try to fix it :)

olivierHa commented 9 years ago

Hello,

could you test with master version ?

Regards

Corbyn commented 9 years ago

Hi Olivier, I installed the current master version and did the same testst to try to reproduce this error... and everything works fine now! Nice fix! :)

N-Mi commented 9 years ago

I can confirm this fixes the issue.

Thanks !