shawn-sterling / graphios

A program to send nagios perf data to graphite (carbon) / statsd / librato / influxdb
288 stars 100 forks source link

Cannot use Nagios macros in _graphiteprefix/_graphiteppostfix #64

Closed druchoo closed 9 years ago

druchoo commented 9 years ago

I'd like to do this:

define host {
    host_name                   myhost
    check_command               check_host_alive
    _graphiteprefix             $HOSTGROUPALIAS$
}

I tried $$HOSTGROUPALIAS$$ and \$HOSTGROUPALIAS\$ but '\' and '$' are translated to '_'.

shawn-sterling commented 9 years ago

As far as I know, you can't use a macro inside a custom variable.

When you do this, do the files in the spool directory have what the hostgroupalias is actually set to? If so, I'm mistaken and I should be able to fix that.

If not, then you can work around this by changing the graphios nagios command to use the nagios environment variable. So instead of:

command_line            /bin/mv /var/spool/nagios/graphios/host-perfdata /var/spool/nagios/graphios/host-perfdata.$TIMET$

you would put something like

command_line            /bin/mv /var/spool/nagios/graphios/host-perfdata /var/spool/nagios/graphios/host-perfdata.$TIMET$ && sed -ie 's/\$HOSTGROUPALIAS\$/'"$NAGIOS_HOSTGROUPALIAS"'/g /var/spool/nagios/graphios/host-perfdata.$TIMET$

which would replace any instances of the string '$HOSTGROUPALIAS$' with the nagios environment variable $NAGIOS_HOSTGROUPALIAS (which should be set properly).

Make sense?

druchoo commented 9 years ago

You're probably correct. Hopefully will have some time soon to verify and get back to you.

druchoo commented 9 years ago

Maybe I'm misunderstanding but here's what's in the spool file.

# ls -l; tail -1 host-perfdata.*
total 24K
-rw-r--r-- 1 nagios nagios 316 Feb  5 19:01 host-perfdata.1423162864
-rw-r--r-- 1 nagios nagios 18K Feb  5 19:01 service-perfdata.1423162864
DATATYPE::HOSTPERFDATA  TIMET::1423162852       HOSTNAME::some.hostname.net      HOSTPERFDATA::rta=0.712000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0    HOSTCHECKCOMMAND::check-host-alive HOSTSTATE::UP   HOSTSTATETYPE::HARD     GRAPHITEPREFIX::$_HOSTGRAPHITEPREFIX$   GRAPHITEPOSTFIX::$_HOSTGRAPHITEPOSTFIX$

host.cfg

define host{
        host_name       some.hostname.net
        alias           some.hostname
        use             generic-host
        _graphiteprefix $HOSTGROUPNAME$
        }

hostgroup.cfg

define hostgroup {
        alias           Some important app
        hostgroup_name  app-group
        members         some.hostname.net
        }

Anyway it may be a moot point since I just did the following workaround as I think it suits my use case better. It also gives more flexibility since our naming scheme is not 100% uniform. In general we was to extract the beginning part of the hostname and use that as the first level in your tree.

# diff /usr/lib/python2.6/site-packages/graphios_backends.py ~/graphios_backends.py
312,318c312
< #             pre = ""
<             # Strip suffixes from hostnames
<             pre = re.sub(r"^(.+)-foo.*_mycompany_net$", r"\1", m.HOSTNAME)
<             pre = re.sub(r"^(.+)_site_mycompany_net$", r"\1", pre)
<             pre = re.sub(r"^(.+)-\d+.+_mycompany_net$", r"\1", pre)
<             pre = re.sub(r"^(.+?)\d+.+_mycompany_net$", r"\1", pre)
<             pre = pre + "."
---
>             pre = ""

So the directory structure will look like:

/var/lib/carbon/whisper/app1
/var/lib/carbon/whisper/app1/host1
/var/lib/carbon/whisper/app1/host1/Process-Tomcat
/var/lib/carbon/whisper/app1/host1/Process-Tomcat/procs.wsp
/var/lib/carbon/whisper/app1/host1/CPUUsage
/var/lib/carbon/whisper/app1/host1/CPUUsage/cpu_iowait.wsp
/var/lib/carbon/whisper/app1/host1/CPUUsage/cpu_total.wsp
/var/lib/carbon/whisper/app1/host1/CPUUsage/cpu_user.wsp
/var/lib/carbon/whisper/app1/host1/CPUUsage/cpu_system.wsp

Can this be achieved another way?

shawn-sterling commented 9 years ago

The part you pasted about what's inside the spool file, basically says that nagios is ignoring the macro (so you can't use a macro inside a custom variable); the workaround I posted above would work though.

I'm a bit confused what you are trying to do here, from what I gather you want to take

somehostname.mycompany.net

and translate that to a carbon metric that looks like this:

somehostname.CPUUsage.cpu_iowait

If so it would be pretty simple to add an option to chomp the hostname and you could do something like

myhostname = m.HOSTNAME.split('.', 1)

Or am I way off?

Take it easy.

-Shawn

druchoo commented 9 years ago

You got it but I should have given more examples of the hostnames we have. For all cases it's not as simple as chomping the hostname. The general format in regex:

some-type-of-description(-vm)?-[0-9]+.environemntOrLocation.[a-z].mycompany.net

However this is not set in stone. Here are some examples with parentheses indicating what needs to be extracted.

(app-name)-vm-01.dev.m.mycompany.net
(this-is-a-long-app-name)01.prod.mycompany.net
(ramdomserver)-999.asia.mycompany.net

I guess point is I'd need a way to do more than one type of operation to extract the part of the hostname I want. I tried to do it in one regex but it was getting really messy. Is it possible to implement such a translation/extraction mechanism in the config for prefixes/suffixes?

So I guess this is actually a feature request now and not a bug :-)

shawn-sterling commented 9 years ago

What do you think the config part would look like? If we ignore the code on how to get there, what options do you put in the config file to describe what you want to accomplish?

druchoo commented 9 years ago

This is what I do in PHP for a another graphing system we use that's currently running in parallel with graphios. Kudos to you if you know this system ;-).

/*
 * This is the config file for the bulkProcessPerformanceData.php script. Syntax of this config file is PHP code.
 * It follows these rules:
 *
 * + Three main arrays:
 *   - 'api' is API settings
 *   - 'perdata' is the format of the perfdata file
 *   - 'templates' are used to create the objects and indicators
 *
 * + 'templates' consist of 3 arrays: filter, object, indicators and a name key that is used for identification
 *   - 'filter' is used to restrict processing of lines for that template to lines that match the specified criteria
 *   - Possible values for keys of 'filter' are those values that were specified in 'fields' array.
 *   - The values for 'filter' keys are evaluated as Perl case insensitive regexes.
 *
 *   - Capture groups can be used 1 or more times to create a token to use in 'object' and 'indicators' values.
 *   - The token name is %<name of key>[# of capture group] i.e. %host[1].
 *   - Capture groups start at 1.
 *   - If the token is specified in all uppercase the value of the token will be converted to all uppercase otherwise it
 *     is unchanged from what was captured.
 *
 *   - Possible keys for 'object' are the following. All 3 are required.
 *     - 'type'
 *     - 'name'
 *     - 'description'
 *
 *   - Possible keys for 'indicators' are:
 *     - required:
 *       - 'name'
 *       - 'value'
 *     - optional:
 *       - 'type'
 *       - 'dataUnits'
 *       - 'dispayUnits'
 *       - 'maxValue'
 *       - 'description'
 */
        array(
            'name'       => 'TCP Connections',
            'filter'     => array(
                'service'  => '^TCP (\d+) Connections$',
                'perfdata' => '^connections=(\d+)$',
            ),
            'object'     => array(
                'type'        => 'Network Connections',
                'name'        => 'Network Connections',
                'description' => 'Network Connections',
            ),
            'indicators' => array(
                array(
                    'name'         => 'TCP %service[1] - Established',
                    'value'        => '%perfdata[1]',
                    'type'         => 'GAUGE',
                    'dataUnits'    => 'Number',
                    'displayUnits' => 'Number',
                )
            ),
        ),

Given that example and your data structure of:

        self.LABEL = ''                 # The name in the perfdata from nagios
        self.VALUE = ''                 # The measured value of that metric
        self.UOM = ''                   # The unit of measure for the metric
        self.DATATYPE = ''              # HOSTPERFDATA|SERVICEPERFDATA
        self.METRICTYPE = 'gauge'       # gauge|counter|timer etc..
        self.TIMET = ''                 # Epoc time the measurement was taken
        self.HOSTNAME = ''              # name of th host measured
        self.SERVICEDESC = ''           # nagios configured service description
        self.PERFDATA = ''              # the space-delimited raw perfdata
        self.SERVICECHECKCOMMAND = ''   # literal check command syntax
        self.HOSTCHECKCOMMAND = ''      # literal check command syntax
        self.HOSTSTATE = ''             # current state afa nagios is concerned
        self.HOSTSTATETYPE = ''         # HARD|SOFT
        self.SERVICESTATE = ''          # current state afa nagios is concerned
        self.SERVICESTATETYPE = ''      # HARD|SOFT
        self.GRAPHITEPREFIX = ''        # graphios prefix
        self.GRAPHITEPOSTFIX = ''       # graphios suffix
        self.VALID = False              # if this metric is valid

A possible, very robust solution could be the following.

global_prefix = 'nagios_perfdata'

prefix_filter1 = {
    'match' : {
        'SERVICEDESC' : '^TCP (\d+) Connections$',
        'PERFDATA'    : '^connections=(\d+)$',
    },
    'prefix'          : 'TCP_%SERVICE[1]_-_Established'
    'value'           : '%PERFDATA[1] * 16'
}
prefix_filter2 = {
    'match' : {
        'HOSTNAME' : '^(app-name)-vm-01.dev.m.mycompany.net$',
    },
    'prefix'        : '%HOSTNAME[1]'
}

For INI you'd I guess this would have to be:

global_prefix = 'nagios_perfdata'
prefix_filter1 = { 'match' : { 'SERVICEDESC' : '^TCP (\d+) Connections$', 'PERFDATA' : '^connections=(\d+)$', }, 'prefix' : 'TCP_%SERVICE[1]_-_Established' }
prefix_filter2 = { 'match' : { 'HOSTNAME' : '^(app-name)-vm-01.dev.m.mycompany.net$', }, 'prefix' : '%HOSTNAME[1]' }

I prefer the approach of having everything in the graphios CFG file vs using VARs in the Nagios CFGs. It's very flexible and would allow parsing of non standard perf data formats.

If this is just way too much to implement, however, I completely understand, but thank you for entertaining the idea.

druchoo commented 9 years ago

It turns out carbon-aggregator can actually accomplish what I need with rewrite-rules.conf. Only downside is you have to run aggregator and forward to carbon-cache. I guess, you can close this issue or mark as feature enhancement.

http://graphite.readthedocs.org/en/latest/config-carbon.html#rewrite-rules-conf

The form of each line in this file should be as follows:

regex-pattern = replacement-text
This will capture any received metrics that match ‘regex-pattern’ and rewrite the matched portion of the text with ‘replacement-text’. The ‘regex-pattern’ must be a valid Python regular expression, and the ‘replacement-text’ can be any value. You may also use capture groups:

^collectd\.([a-z0-9]+)\. = \1.system.
Which would result in:

collectd.prod.cpu-0.idle-time => prod.system.cpu-0.idle-item

More examples: https://github.com/indygreg/collectd-carbon/blob/master/examples/carbon.rewrite-rules.conf