grafana / rollout-operator

Kubernetes Rollout Operator
Apache License 2.0
130 stars 17 forks source link

Operator calling the wrong pod endpoint when trying to scale down ingesters #125

Open grecuionut opened 7 months ago

grecuionut commented 7 months ago

We are trying to scale down ingesters using the MutatingAdmissionWebhook as described here.

The required labels and annotations were added to the objects as shown below:

# labels
grafana.com/prepare-downscale=true

# annotations
grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port=8080

When trying the scale down the statefulset/mimir-ingester-zone-a, the operator failing to resolve the pod when sending HTTP post request, as the fqdn is constructed as <pod_name>.<service_name>.<namespace>.svc.cluster.local.

mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local

These are the existing services:

mimir-ingester-headless                  ClusterIP   None             <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-a                    ClusterIP   <ip_address>     <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-b                    ClusterIP   <ip_address>    <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-c                    ClusterIP   <ip_address>    <none>        8080/TCP,9095/TCP   19d

In order to resolve the pod, the headless service should be used instead (mimir-ingester-headless). More info

Operator logs

level=error ts=2024-01-16T13:29:51.838387358Z name=mimir-ingester-zone-a resource=statefulsets namespace=mimir request_gvk="autoscaling/v1, Kind=Scale" old_replicas=2 new_replicas=1 url=mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local:443/ingester/prepare-shutdown index=1 msg="error sending HTTP post request" err="Post \"http://mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local:443/ingester/prepare-shutdown\": dial tcp: lookup mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local on <name_server_ip>:53: no such host"