grafana / rollout-operator

Kubernetes Rollout Operator
Apache License 2.0
141 stars 20 forks source link

Invoke DELETE on pod prepare-downscale path if any POSTs failed #146

Closed seizethedave closed 6 months ago

seizethedave commented 6 months ago

This addresses a bug in rollout-operator where:

  1. Kubernetes receives a request to downscale a statefulset by X hosts.
  2. The prepare-downscale admission webhook attempts to prepare X pods for shutdown by sending an HTTP POST to their handler identified by the grafana.com/prepare-downscale-http-path and -port annotations.
  3. At least one of these requests fails. The admission webhook returns an error to Kubernetes, so the downscale is not approved.
  4. 💥 But some hosts may have been prepared for downscale. 💥

This PR adds cleanup logic to issue DELETE requests on all involved pods if any of the POSTs failed. Notes:

This doesn't fix the similar issue where replica count changing from 10->9->10 leaves that one pod prepared for shutdown. (But that's in the works.)

pracucci commented 6 months ago

Let's wait for @pstibrany review here. He's the expert in this area.