Closed spicysomtam closed 9 years ago
I havent tried this myself, but I expect you could copy /usr/share/okconfig/examples/host.cfg-example to somewhere and edit:
# We are still using Ping as a service in addition to check-host-alive, all hosts
# deserve a ping
define service {
use okc-check_ping
host_name HOSTNAME
service_description Ping
#__CRITICAL_PACKETLOSS 40%
#__CRITICAL_ROUND_TRIP 500.0
#__WARNING_PACKETLOSS 20%
#__WARNING_ROUND_TRIP 100.0
}
either change the ping template or uncomment the macros (__CRITICAL_PACKETLOSS etc...) and increase the timeout. Then use okconfig on the command line to install and specify the new host template file.
Wow, that works! Thanks for the reply.
Copied /usr/share/okconfig/examples/host.cfg-example to /etc/nagios/okconfig/examples/ and then edited insitu. Removed the comments for the custom variables and set them to the values I want, and it works!
define host {
use PARENTHOST
host_name HOSTNAME
address IPADDR
alias ALIAS
contact_groups GROUP
hostgroups GROUP
};
# This is a template service for HOSTNAME
# Services that belong to this host should use this as a template
define service {
name HOSTNAME
use GROUP-default_service
host_name HOSTNAME
contact_groups GROUP
service_groups GROUP
register 0
}
# We are still using Ping as a service in addition to check-host-alive, all hosts
# deserve a ping
define service {
use okc-check_ping
host_name HOSTNAME
service_description Ping
__CRITICAL_PACKETLOSS 100%
__CRITICAL_ROUND_TRIP 5000.0
__WARNING_PACKETLOSS 80%
__WARNING_ROUND_TRIP 3000.0
}
Cool. I'm curious, what kind of hosts need this kind of high RTA thresholds? Sattelite connection?
I took them from the old nagios config thats been running for 6 years or so :) Basically we are in the UK and have two data centers in the US (mid west). I guess they need reviewing as they are system wide, but the standard ones in your templates were alarming for the US ones. Got to leave, but will look at the plugin closer tomorrow, try to tune it, and get back to you.
The old nagios config is horrible (XI based). Every service definition has the check interval, retry interval and retries defined. I am trying to get away from that, and set those in a 'use' template; eg a single standard template used across all Linux services. Simpler and cleaner. Basically ping/host checks are every 5 mins and service checks once an hour. Puppet should do the restart of failed services in between meaning less alerts.
Your help much appreciated, and hopefully this will help someone else. Kind of filling in the documentation gaps.
Regarding the long rta's; I think this was put inplace to prevent network issues creating lots of notifications. Basically if there are any intermittent comms issues, it will take longer to alert. I will stick with the 3s/5s settings for now.
I decided to redo this. I removed those custom variables and created copies of okc default host and check ping, so I can customise the check intervals via a use. Otherwise if I want to customise these later, I need to do a bulk edit of the *-host.cfg files. New /etc/nagios/okconfig/examples/host.cfg-example:
# Use co-* templates copied from the defaults so the check intervals/values can be customised.
define host {
use co-default-host
host_name HOSTNAME
address IPADDR
alias ALIAS
contact_groups GROUP
hostgroups GROUP
};
# This is a template service for HOSTNAME
# Services that belong to this host should use this as a template
define service {
name HOSTNAME
use GROUP-default_service
host_name HOSTNAME
contact_groups GROUP
service_groups GROUP
register 0
}
# We are still using Ping as a service in addition to check-host-alive, all hosts
# deserve a ping.
define service {
use co-default-ping
host_name HOSTNAME
service_description Ping
}
Cool. If you want to minimize alerts during network issues, you could use network parents.
I wanted to override the defaults for this primarily as we need longer ping timeouts than in your default settings (held in /usr/share/okconfig).
The default settings specify 'use okc-default-host' in the 'define host' and 'use okc-check_ping' for the service host Ping check. Following these through, okc-default-host uses 'check_command check-host-alive'. check-host-alive is an object in /etc/nagios/objects/commands.cfg, so it can be customised. Service okc-check_ping' uses 'check_command okc-check_ping' and specifies custom variables. command and service ok-check_ping live under /usr/share/okconfig and should not be customised. However the custom variables could be specified at the highest level in the -hosts.cfg for the host, but how is that customised?
Thus in summary, I could customise the default host check via tweaking check-host-alive in /etc/nagios/objects/commands.cfg. I am ok with that.
I could also update the service okc-check_ping by changing the values of the custom variables, but that would update /usr/share/okconfig/templates/misc/services.cfg, and adagios web gui won't allow this. Thus the answer is to set the custom variables in the *-host.cfg file? Thus how is the host service check customised?
Also I observe:
-Web page does not allow 'define host' 'use' to be specified, although can be specified on okconfig command line (--use).
-Cannot override 'use okc-check_ping' in the service check (similar to above).
-What is --host_template in the okconfig command and where would a custom one live if you created one?