jjethwa / icinga2

GNU General Public License v3.0
223 stars 189 forks source link

How To: Monitoring Clusters #51

Closed gvenka008c closed 7 years ago

gvenka008c commented 7 years ago

Hi @jjethwa / All,

I am looking for some information on monitoring service and host clusters using check_cluster plugin. Has anyone has example on icinga2 to get started? Please suggest.

https://docs.icinga.com/latest/en/clusters.html

Thanks, Govind

jjethwa commented 7 years ago

Hi @gvenka008c

This question might get better traction on the Monitoring Portal forums: https://monitoring-portal.org/index.php?board/117-icinga-2/

I think you need to add a new CheckCommand for check_cluster, then a new Service (apply), but I have not implemented the check myself

gvenka008c commented 7 years ago

@jjethwa Thanks.

jjethwa commented 7 years ago

@gvenka008c np 😄

gvenka008c commented 7 years ago

@jjethwa Tried this but couldn't have it work.

commands.conf


object CheckCommand "check_cluster" {
  import "plugin-check-command"

  command = [ "/usr/lib64/nagios/plugins/check_cluster" ]

  arguments = {
    "-l" = "$check_cluster_label$"
    "-w" = "$check_cluster_warning$"
    "-c" = "$check_cluster_critical$"
    "-d" = "$check_cluster_data$"
  }
}

services.conf

object Service "my-cluster-check" {
  host_name = NodeName
  check_command = "check_cluster"
  vars.check_cluster_label = "PING"
  vars.check_cluster_warning = 1
  vars.check_cluster_critical = 2
  vars.check_cluster_data = {{ get_service("host1.net", "PING").state_id + ", " + get_service("host2.net", "PING").state_id }}
}

When I bring down one of the node, it is still showing OK status. Didn't see Warning. Will keep you posted with the findings :)

jjethwa commented 7 years ago

Hmm, seems like that's the correct config and parameters needed according to the check documentation: https://www.monitoring-plugins.org/doc/man/check_cluster.html

I wonder if there is a cluster health check setting that needs to be tweaked?

gvenka008c commented 7 years ago

@jjethwa Yes, still trying to figure it out on how to run it from command line. Not sure how to get his argument.

-d, --data=LIST The status codes of the hosts or services in the cluster, separated by commas

gvenka008c commented 7 years ago

@jjethwa

I was trying to get the state of the host as shown below

 vars.check_cluster_data = get_host("host1").state + ", " + get_host("host2").state

Both are returning value as 1

Check Cluster Data  1, 1

Any thoughts? Both the nodes are up and running. Is that the right way of getting the host state? Please suggest.

jjethwa commented 7 years ago

Hi @gvenka008c

Unfortunately I don't know enough about the clustering and cluster monitoring part of Icinga2. This might be helpful as it shows how to alert when a satellite server is down: https://monitoring-portal.org/index.php?thread/34549-solved-satellite-down-no-state-change/

gvenka008c commented 7 years ago

thx @jjethwa

jjethwa commented 7 years ago

np @gvenka008c Wish I could have helped more 😄