skywalka / splunk-for-nagios

Analytics for Nagios
GNU General Public License v3.0
22 stars 6 forks source link

Puppet Dashboard #6

Open tfhartmann opened 11 years ago

tfhartmann commented 11 years ago

I thought it might be userful to have Puppet Checks on the Host perf dashboards, or maybe a set of dashboards dedicated to puppet

xkilian commented 11 years ago

To tell the truth, I personally do not believe in storing non critical performance time-series data in Splunk from a simple cost perspective. You are better off storing your time-series in Graphite and using the HTTP API and displaying the graph itself in Splunk. ;-)

Through Livestatus you can get the state data for the dashboard, but the perf data..... Any way that is my 2 cents.

I have no comment concerning the Puppetdashboard itself as to if it should or not be implemented.

For all suggestions, maybe an Ideascale would be pertinent to manage all the suggested improvements that are not accompanied by a Pull request or work in progress. What do you think @skywalka ?

skywalka commented 11 years ago

@tfhartmann which puppet checks do you suggest? I'm happy to do this as puppet is awesome :)

skywalka commented 11 years ago

@xkilian using splunk to display graphs from graphite is an awesome idea! do you have a working example that we could build upon and incorporate into a new dashboard?

skywalka commented 11 years ago

@xkilian ideascale looks cool but I like using the issue tracking here on github and using it as a one stop shop for everything splunk for nagios :)

xkilian commented 11 years ago

No problem, one stop shop is good. :-)

xkilian commented 11 years ago

Using Graphite as a source means using the HTTP API. You can get time-series data as JSON blobs and generate your own graph image using the native Splunk graphing engine or directly as images generated by the Graphite web service.

Getting a graph is simply requesting: http://graphite-server/render?target=server.web1.load&height=800&width=600

You have the destination IP/hostname of the graphite server, the name of the data you wish to retrieve and then a list of &options. This is the simplest example available.

The challenge here is either load pre-defined dashboards from the Graphite server, or build your own template library to call and associate with each data source or list of datasources. Graphite supports wildcards! Ex: server.*.load Crazy stuff I tell ya.

Each dashboard you want to create, if you know the data to get, simply fire up the composer, and then copy/paste the URL. Your code simply replaces the variable name at runtime in the dashboard.

Even better is having your code build its own URLs dynamically based on Splunk side templates and rules. See: https://github.com/ClockworkNet/graphite-dashgen

tfhartmann commented 11 years ago

@skywalka My initial thought was pretty simple, I was thinking a couple of panels maybe on the performance page. Currently I just do a really simple "Is Puppet Running" Check and I thought it would be cool to see the availability of that, but then I thought that since the puppet master logs to messages anyway that maybe some simple dashboards that combine that info with any other checks might be interesting.

[rhelbuild nrpe.d]# pwd /etc/nrpe.d [rhelbuild nrpe.d]# cat check_puppet.cfg command[check_puppet]=/usr/lib64/nagios/plugins/check_procs -w 1:0 -c 1:2 -C puppet [rhelbuild nrpe.d]#

skywalka commented 11 years ago

ok cool, I'll work on a livestatus hook for the puppet agent and also for successful puppet runs

skywalka commented 11 years ago

Initial dashboard is available in the dela branch for review: https://github.com/skywalka/splunk-for-nagios/blob/dela-3.0.0-rc/local/data/ui/views/NagiosLinuxPuppetPerformanceGraphs.xml

FYI: Here is my puppet config for the nagios checks:

nagios::nrpe::check { "puppet": command => "check_procs -c 1:1 -a '/usr/bin/puppet agent' -u root" }

nagios::nrpe::checkwithsudo { "puppet-run": sudopath2 => "/usr/bin/sudo", command => "check_file_age -f /var/lib/puppet/state/state.yaml -w 5400 -c 7200" }

@@nagios_service { "${hostname}_check_puppet":
    ensure => present,
    host_name => "${fqdn}",
    notification_interval => 60,
    flap_detection_enabled => 0,
    service_description => "Puppet Agent",
    check_command => "check_nrpe_1arg!puppet",
    use => "std-service",
    target => "/etc/nagios3/conf.d/dynamic.cfg";
}

@@nagios_service { "${hostname}_check_puppet-run":
    ensure => present,
    host_name => "${fqdn}",
    notification_interval => 60,
    flap_detection_enabled => 0,
    service_description => "Puppet Run",
    check_command => "check_nrpe_1arg!puppet-run",
    use => "std-service",
    target => "/etc/nagios3/conf.d/dynamic.cfg";
}
wcooley commented 11 years ago

Hope you don't mind, but I'm going to plug my Puppet app for Splunk: https://github.com/wcooley/splunk-puppet -- it's nearly ready for pre-release announcement and while there are some rough edges (drilldown, for example, and I haven't tested with a clean installation), much of it is there and functional--my team and I use it every day.

I have two separate lists of agents -- one extracted (and maintained with scsheduled expiration) from the master & agent logs themselves and one from a custom search command that queries the master's inventory service directly. In addition to using this for finding hosts that are not successfully applying their catalogs, I also track catalog application time and am able to find anomaies there too.

skywalka commented 11 years ago

Nice work Wil! I'll definitely have a crack at installing it and having a play... thanks for the heads up! I was considering doing a full re-write of the current Splunk for Puppet app so it's good to know that you have been there, done that.