joerghoh / cq5-healthcheck

CQ5 Healthcheck code
Apache License 2.0
28 stars 15 forks source link

Important Nodes

I abandoned this project in favor of Sling Healthchecks (http://sling.apache.org/documentation/bundles/sling-health-check-tool.html), which offers far more features than my project. So the purpose of this project (demonstrating the need for healthchecks in the product) is fulfilled, and I thank Bertrand Delacretaz to taking up this idea.

I will leave this project open on github, but please do not use it anymore.

Kind regards, Jörg

cq5-healthcheck

This small project supports you when you need to monitor your CQ5 system. "Monitoring" means that an automatic process checks CQ5 every few seconds (or minutes) if it is still fully functional. By default CQ5 does not have an endpoint, which offers that kind of information, so I created this project to provide them. This information can be consumed by both a automatic monitoring system (e.g. Nagios), but is also a big help for any person, which is supposed to run CQ5 instances.

The projects consists of these elements:

You can implement arbitrary HealthStatusProviders to report the status of any module or subsystem, for example:

Currently only a small number of HealthStatusProviders is available. In the bundle JMX extensions also a number of custom MBeans are contained; these can be used by the MBeanStatusProvider.

This project is released under Apache License.

Quick start

Example config

Most functionality of this project lies in the ability to easily query MBeans and provide thresholds for the 3 states "OK", "Warning" and "Critical".

These configurations are stored in the repository and be created at any time. Whenever a configuration is changed, these changes are picked up and applied immediately.

So, for a first, let's configure some thresholds for the "publish1" replication agent. We want, that whenever the length of the queue reaches 100 or higher, that the monitoring reports a warning. Use CRXDE (Lite) for this:

Note:

When you reload your statuspage, you should see an additional entry with the mbean name "com.adobe.granite.replication:type=agent,id="publish1"" and status "OK", if the queue size of this replication agent is smaller than 100.

How to configure (monitor MBeans)

You've seen, that we created a new node containing the check definition. The node name denotes the ServiceFactory ("de.joerghoh.cq5.healthcheck.impl.providers.MBeanStatusProvider") and a part which you can choose freely. Choose a speaking name for it.

The name of the mbean is given in the assignment to the property "mbean.name", while the monitoring condition is encoded in the assignment of the property "mbean.property".

The definition attribute consists of 4 elements:

Currently the following comparator functions are supported:

The statuspage

By default this project ships with a statuspage in /content/statuspage.html; it displays

The overall status is composed the individual status informations like this:

There is an additional condition:

This reflects a situation, where bundles are not loaded and therefor mbeans and their checks are not available. In most cases this occurs during startup time and will go away, when all bundles activated and their services are running.