Prepare downscale webhook

JordanRushing commented 1 year ago

This PR follows https://github.com/grafana/rollout-operator/pull/25 and implements a prep_downscale mutating admission webhook that, after receiving a request that scales down a resource, sends a HTTP POST to pods in a StatefulSet/Deployment/ReplicaSet that are labeled with grafana.com/prep-downscale: true.

The endpoint targets for the requests are crafted by extracting and combining the grafana.com/prep-downscale-http-path and grafana.com/prep-downscale-http-port labels along with the diff in the number of replicas.

An example scenario for when this is useful:

KEDA evaluates a metric query and suggests to scale down the replicas in a StatefulSet
The pods in the StatefulSet implement a prep_shutdown HTTP endpoint that tells them to expect to be terminated on the next SIGTERM and manage their state appropriately
The new mutating admission webhook activates, validates the scale down, successfully hits the prep_shutdown HTTP endpoints on relevant pods, and allows the admission request thus resulting in the pods being deleted gracefully

colega commented 1 year ago

👋 Thank you for taking this task.

Did a quick first review, and have some feedback:

~This has to be a MutatingWebhook, and it should state that it has side-effects.~
- Edit: I was wrong here, it can't be a mutating webhook really, because when the operation is /scale we can't mutate anything, but we still have side-effects.
We should add an annotation to the downscaled resource saying when it was downscaled last time, so we can validate that and disallow downscaling other zones when one of the zones was recently downscaled.

CLAassistant commented 1 year ago

All committers have signed the CLA.

MichelHollands commented 1 year ago

This comment still applies:

We should add an annotation to the downscaled resource saying when it was downscaled last time, so we can validate that and disallow downscaling other zones when one of the zones was recently downscaled.

We're doing this to make possible the automation of downscaling, and we need to have the minimum safety net here that will prevent zones from being downscaled with a too small interval.

@colega By downscaled resource you mean the stateful set?

colega commented 1 year ago

@MichelHollands

@colega By downscaled resource you mean the stateful set?

Yes, I mean the statefulset. We have 3 zones, and we need to wait a safety period of time between downscaling each one of them (the three statefulsets sharing the zone label).

Maybe this isn't a requirement for Loki and we should implement it optional for Mimir? Anyway, I'd define the waiting period as a label on the statefulset too (grafana.com/min-time-between-zones-downscale: 12h)

(OMG it's been a long day, and I edited your comment instead of replying :facepalm:)

MichelHollands commented 1 year ago

Yes, I mean the statefulset. We have 3 zones, and we need to wait a safety period of time between downscaling each one of them (the three statefulsets sharing the zone label).

Maybe this isn't a requirement for Loki and we should implement it optional for Mimir? Anyway, I'd define the waiting period as a label on the statefulset too (grafana.com/min-time-between-zones-downscale: 12h)

Yeah, in Loki we don't need this time between downscaling. The rollout operator will make sure only 1 stateful set at a time is scaled up or down. The Loki ingesters will have had the prep-downscale endpoint called. So this works for us.

I'll add something similar to https://github.com/grafana/mimir/pull/4718/files to the Loki ingesters. Then we won't need the pod annotations anymore (which wouldn't work reliably anyways as you mentioned).

colega commented 1 year ago

The rollout operator will make sure only 1 stateful set at a time is scaled up or down.

The rollout operator does not participate in the scaling down operation. (That is why we're doing this as a webhook the manifest change rather than coordinating from the rollout-operator conciliation).

MichelHollands commented 1 year ago

@colega I think I've addressed all the review comments. Can you have another look? The CI failure seems unrelated: the pod is pending. Can you rerun the CI? I don't have permission.

MichelHollands commented 1 year ago

Looks like this is getting close to being merged. Remember to add an entry to the CHANGELOG (and sorry for the previous dependency update conflict!)

The CHANGELOG has been updated as well as the README.md. The e2e tests were fixed. However in the CI the last test fails occasionally. I'm not sure how to fix it at the moment.

MichelHollands commented 1 year ago

The TestExpiringCertificate was a bit flaky. After increasing one timeout there it seems a lot more stable. Can someone merge this? I don't have permission. The same might apply for making a new release.

colega commented 1 year ago

Thank you!

grafana / rollout-operator

Prepare downscale webhook #47