An OpenShift Operator using the Operator SDK that installs an Application Monitoring Stack consisting of Grafana, Prometheus & Alertmanager.
The following resources are supported:
Triggers the installation of the monitoring stack when created. This is achieved by deploying two other operators that install Prometheus and Grafana respectively.
The Application Monitoring CR accepts the following properties in the spec:
middleware-monitoring
label that has to be present on all imported resources (prometheus rules, service monitors, grafana dashboards).The Blackbox Target CR accepts the following properties in the spec:
The blackboxTargets
should be provided as an array in the form of:
blackboxTargets:
- service: example
url: https://example.com
module: http_extern_2xx
where service
will be added as a label to the metric, url
is the URL of the route to probe and module
can be one of:
Represents a set of alert rules for Prometheus/Alertmanager. See the prometheus operator docs for more details about this resource. An example PrometheusRule can be seen in the example app template
Represents a Service to pull metrics from. See the prometheus operator docs for more details about this resource. An example ServiceMonitor can be seen in the example app template
Represents pods to pull metrics from. See the prometheus operator docs for more details about this resource.
Represents a Grafana dashboard. You typically create this in the namespace of the service the dashboard is associated with. The Grafana operator reconciles this resource into a dashboard. An example GrafanaDashboard can be seen in the example app template
You will need cluster admin permissions to create CRDs, ClusterRoles & ClusterRoleBindings. ClusterRoles are needed to allow the operators to watch multiple namespaces.
make cluster/install
You can access Grafana, Prometheus & AlertManager web consoles using the Routes in the project.
Run the following commands
# Check the project exists
$ oc project
Using project "application-monitoring" on server "https://your-cluster-ip:8443"
# Check the pods exist e.g.
$ oc get pods
NAME READY STATUS RESTARTS AGE
alertmanager-application-monitoring-0 2/2 Running 0 1h
application-monitoring-operator-77cdbcbff-fbrnr 1/1 Running 0 1h
grafana-deployment-6dc8df6bb4-rxdjs 1/1 Running 0 49m
grafana-operator-7c4869cfdc-6sdv9 1/1 Running 0 1h
prometheus-application-monitoring-0 4/4 Running 1 36m
prometheus-operator-7547bb757b-46lwh 1/1 Running 0 1h
These steps create a new project with a simple server exposing a metrics endpoint. The template also includes a simple PrometheusRule, ServiceMonitor & GrafanaDashboard custom resource that the application-monitoring stack detects and reconciles.
oc new-project example-prometheus-nodejs
oc label namespace example-prometheus-nodejs monitoring-key=middleware
oc process -f https://raw.githubusercontent.com/david-martin/example-prometheus-nodejs/master/template.yaml | oc create -f -
You should see the following things once reconciled:
example-prometheus-nodejs/example-prometheus-nodejs
in PrometheusAPIHighMedianResponseTime
The example application provides three endpoints that will produce more metrics:
/
will return Hello World after a random response time/checkout
Will create random checkout metrics/bad
Can be used to create error rate metricsYou can run the Operator locally against a remote namespace. The name of the namespace should be application-monitoring
. To run the operator execute:
$ make setup/gomod
$ make cluster/install/local
$ make code/run
AMO_VERSION
and PREV_AMO_VERSION
values in Makefilemake gen/csv
to generate a new manifest. clusterPermissions
block in generated csv
is up to date with cluster roles in cluster-roles
directory.Ensure:
csv
file points to the latest version of the operator image. Note the images are referenced twice in the csv
.deploy/operator.yaml
has the correct image version tag. All image tags should be prefixed with a v
application-monitoring-operator.package.yaml
references the correct version.1.0.1
) in the following files: Makefile, operator.yaml, version.go)master
1.0.1
. Ensure to state what's new in the releasedocker login quay.io
Build the Docker image for the new version, and push it to quay.io
:
$ make image/build
$ make image/push