opendatacube / datacube-ows

Open Data Cube Open Web Services
Other
69 stars 37 forks source link

Prometheus Metrics - Noise From Prometheus Probes #943

Closed omad closed 1 year ago

omad commented 1 year ago

Background

I'm trying to improve the dashboards used by Digital Earth Australia for monitoring our Datacube OWS deployment, as used for DEA Maps.

We have Datacube OWS deployed into Kubernetes using the ODC Helm Chart, with metrics being recorded by Prometheus, and a dashboard created within Grafana.

The problem I'm having is dealing with noise in the metrics from automated HTTP requests made by K8s to monitor the health of all the OWS instances. Kubernetes provides three probe types, startup, readiness and liveness. We have the startup and readiness probe setup to create a WMS GetMap request for a rarely used layer, and the liveness probe to hit the /ping endpoint.

Noise Problem

However, this creates a continuous level of noise of WMS requests, that makes the recorded metrics hard to use, particularly when automatically scaling up or down to deal with load, but often just all the time. E.g, see the following log captures for the two types of requests.

image

image

Questions/(Partial?) Solution

I've heard that making requests to /ping for the probes should be sufficient to check that OWS is operating, including checking connectivity to the DB.

I need this to respond with a code greater than or equal to 200 and less than 400 for success. And any other code indicates failure.

Is this the behaviour of /ping?

omad commented 1 year ago

Okay, I've checked the implementation, and it looks perfect. I can reconfigure our K8s probes.

https://github.com/opendatacube/datacube-ows/blob/32ab71cbbd20e4dcd8fb07c8ad890b7fd75fee4a/datacube_ows/ogc.py#L193-L196