grafana / rollout-operator

Kubernetes Rollout Operator
Apache License 2.0
130 stars 17 forks source link

Reduce cardinality of metrics emitted for requests to the Kubernetes control plane #123

Closed charleskorn closed 7 months ago

charleskorn commented 7 months ago

The metrics added in #118 turned out to produce many more series than expected (see discussion here - I missed the pod deletion behaviour in my testing).

For example, in a very small Mimir development cluster with 2 compactors, 9 store-gateways and 21 ingesters, the rollout-operator was emitting 44 unique method / path combinations after a rollout:

DELETE /api/v1/namespaces/the-namespace/pods/compactor-0
DELETE /api/v1/namespaces/the-namespace/pods/compactor-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-6
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-6
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-6
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-2
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-2
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-2
GET /api/v1/namespaces/the-namespace/pods
GET /api/v1/namespaces/the-namespace/secrets/rollout-operator-self-signed-certificate
GET /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations
GET /apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations
GET /apis/apps/v1/namespaces/the-namespace/statefulsets
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/compactor/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-a/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-b/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-c/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-a/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-b/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-c/status

Each of these combinations emits a classic histogram with 17 series each, for a total of 748 series.

It's not uncommon to run clusters with hundreds of ingesters and store-gateways, so this is not sustainable, and it's not necessary either - we're most interested in understanding the performance of a particular kind of request, not the performance of requests for a single specific object.

This PR reduces the cardinality of metrics emitted by grouping equivalent requests together. For example, all pod delete requests would be emitted with path="core/v1/pods object".

The Kubernetes API follows a fairly rigid pattern for URLs, documented here, so the changes in this PR use that pattern to parse the URL and format it for the metric label.