grafana / rollout-operator

Kubernetes Rollout Operator
Apache License 2.0
132 stars 17 forks source link

Implement zoneTracker for coordinating downscales across statefulsets using a file in object storage for the prepare_downscale admission webhook #96

Closed JordanRushing closed 8 months ago

JordanRushing commented 10 months ago

This PR introduces the idea of a zoneTracker using a JSON-encoded file stored in object storage to save timestamps for coordinating downscales between statefulsets with the prepare-downscale admission webhook.

The file's contents:

{
  "ingester-zone-a": {
    "lastDownscaled": "2023-12-14T23:37:41Z"
  },
  "ingester-zone-b": {
    "lastDownscaled": "2023-12-14T23:37:41Z"
  },
  "ingester-zone-c": {
    "lastDownscaled": "2023-12-14T23:37:41Z"
  }
}

When rolling out the prepare-downscale admission webhook to support ingester autoscaling in Loki, we encountered differences in behavior w.r.t the last-downscale annotation that the rollout operator was applying to ingester statefulsets between the major cloud service provider's managed Kubernetes control planes.

HPA controllers in EKS/AKS fail to scale the relevant ingester statefulset after a valid AdmissionRequest is approved in the prepare-downscale webhook due to resourceVersion mismatch errors related to the last-downscale annotation being applied to the ingester statefulsets. We currently believe this is due to differences in managed control plane configurations between the providers but were unable to pinpoint a root cause even after working with the relevant support team for AWS.


Initially, we attempted to submit the annotation patch as a JSONPatch within the AdmissionResponse itself to solve this problem but even that patch was not correctly applied to the ingester statefulsets so downscales were not correctly coordinated.

As a workaround, we considered tracking this state in a file instead of a Kubernetes annotation. A draft PR was opened to do this and it worked with supported changes in the Loki API, but with trade-offs from saving this state on local disk.

After further consideration, we decided to store this file in object storage instead of PVs to simplify and fortify this state tracking with the strong read-after-write consistency guarantees that are provided by the three largest cloud provider's object storage services.

This has been successfully tested as a drop-in replacement for existing namespaces that were using the annotation to coordinate downscales prior on AWS/GCP/Azure (it essentially applies two conditionals to the existing code). It would be possible to add an in-memory bucket option to be configured as well if desired.


go mod vendor & go mod tidy were run as well.


TODO

cstyan commented 9 months ago

Mentioned this on slack but posting here too to see if anyone else agrees; it might make sense to change the naming a bit? The name zoneTracker is a bit ambiguous in this case imo, and I think it doesn't really match what's actually happening right?

What we're actually tracking is more like the time at which a zone was last autoscaled in order to prevent downscaling too frequently.

My first though is that zoneTracker implies we're tracking the zones, like which zones are active for use or something like that. zoneScalingTracker, scalingEventTracker, downscaleTracker would all be less ambiguous imo, but I don't feel strongly enough to block you.

JordanRushing commented 9 months ago

Personally I'd look for a way to start breaking up prepareDownscale into some more functions. Especially with the amount of nested conditionals, it's getting a bit difficult to read. Maybe everything that happens in the block for if the zoneTracker file is being used can be a separate function?

@cstyan I think that's valuable feedback and I'd be happy to open a separate PR to simplify the webhook code and break things out into better functions.

jhalterman commented 9 months ago

Left a few comments we should consider, but this seems nice overall. Might be nice for @56quarters or @andyasp to weigh in on this as well.

56quarters commented 9 months ago

This PR introduces the idea of a zoneTracker using a JSON-encoded file stored in object storage to save timestamps for coordinating downscales between statefulsets with the prepare-downscale admission webhook.

Has any consideration been given to using a configmap in Kubernetes instead of introducing a dependency on object storage for the rollout-operator?

JordanRushing commented 9 months ago

This PR introduces the idea of a zoneTracker using a JSON-encoded file stored in object storage to save timestamps for coordinating downscales between statefulsets with the prepare-downscale admission webhook.

Has any consideration been given to using a configmap in Kubernetes instead of introducing a dependency on object storage for the rollout-operator?

Great question @56quarters! Yes, originally I was considering using a configmap after some deliberation with the team but ultimately, as far as I understand, etcd runs as a part of the managed control plane in EKS/AKS and the differences in the managed control planes between providers + poor visibility motivated this PR so I wanted to move this metadata outside of that.

I wonder if it might be better to move the annotation code path to using a configmap rather than the zoneTracker?

Do you have any concerns about introducing an object storage dependency to an optional code path here? The original code path is currently unchanged.

jhalterman commented 9 months ago

Talking to Jordan, he plans to followup with an attempt at accomplishing this via ConfigMap, which sounds good.

JordanRushing commented 9 months ago

Talking to Jordan, he plans to followup with an attempt at accomplishing this via ConfigMap, which sounds good.

I think we can close this PR in favor of https://github.com/grafana/rollout-operator/pull/107!