kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.59k stars 1.08k forks source link

Difficulties with keda and helm during keda->standard hpa 'upgrade' #6250

Open SleepyBrett opened 1 month ago

SleepyBrett commented 1 month ago

Report

I have created a helm chart that allows users to define standard hpas or 'opt-into' keda. When keda is disabled by values we create a standard hpa, but when keda is enabled, we do not render the hpa and instead render a scaled object that specifies an hpa name. Because of how helm does it's install this is causing us some issues.

A quick overview of how helm installs/upgrades things:

  1. Helm renders all the templates for the current values and produces a number of k8s objects.
  2. Those k8s objects are then created/updated on the cluster (as long as the current ones are owned by helm (have certain labels/annotations)).
  3. Helm then looks for any objects that are 'owned' by the helm release but were not defined in step 1 and it deletes those objects as they are now orphaned.

In the current chart the name of the hpa that is created when keda is disabled is the same as the hpa name we place into the scaled object when keda is enabled. We use the transfer-hpa-ownership annotation to smooth this over.

So from a helm point of view:

  1. renders a scaled object and not an hpa
  2. apply
  1. keda validation webhook sees the current HPA (not owned by keda) and because of the annotation, since the names match, does not care and moves on.
  2. the scaledobject is created
  1. helm removed the current hpa
  2. the keda controller reconciles the scaled object and since the hpa does not exist, it gets created.

So far so good. We now have an hpa owned by the scaled object.

Now when we then disable keda:

  1. helm renders templates and generates an HPA object but no scaledobject
  2. apply, helm updates the hpa that keda created (we think, this is an odd one beucase i would expect helm to choke here on non-ownership, perhaps helm does not remove the current hpa when 'upgrading to keda' because of the keda ownership block? Audit logs could tell us i suppose)
  3. helm removes the scaled object
  4. keda/or k8s controller manager removes the hpa because it was owned by the scaled object
  5. we are left with a deployment with no hpa

So then we think, ok what if the name of the hpa created by a non-keda install and the hpa referenced by a keda install are different. We make the changes but find that when we go to upgrade from non-keda -> keda the validation webhook rejects us, because at the time we are applying the scaledobject the hpa still exists and we get the failed to create resource: admission webhook "vscaledobject.kb.io" denied the request: the workload 'kedatest-microservice' of type 'apps/v1.Deployment' is already managed by the hpa 'kedatest-microservice' error.

Is there any way around this, we have fiddled a bit with the scaled object annotations, but they are, frankly, pretty poorly documented. Specifically validations.keda.sh/hpa-ownership

Expected Behavior

I expect to be able to toggle back and forth between a standard hpa and keda hpa using the standard helm upgrade -i method in a single step process.

Actual Behavior

Deletion of the scaled object deletes the underlying hpa. Leaving a service that has downgraded from keda to standard hpa with no hpa at all.

Steps to Reproduce the Problem

I've kind of discussed above, but I can provide a slimmed down helm chart on request.

Logs from KEDA operator

logs are unimportant

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

unimportant

Anything else?

It seems to me that you could implement a new annotation that would, on scaledobject deletion, instead of marking for delete immediately, first remove the ownership claim from the hpa. Thus leaving the hpa intact. Thoughts?

We realize that this is a bit of an edge case, but it is one that would bite pretty hard and it concerns us.

JorTurFer commented 1 month ago

Hello I get the reproduction steps, but the problem there is the ownership of the HPA. When KEDA deploys the HPA, registers the ScaledObject as owner reference in the HPA, and it's k8s the responsible for removing the HPA. This mechanism is there to remove orphan resources, so we can't disable it. Currently, we don't support disabling specific rules one by one, so the best workaround that I can suggest is removing the admission webhooks (or just scale them to 0) and use different names for your HPA and KEDA's HPA. In the midterm, I think that supporting more annotations to disable specific admission rules would be a nice feature, if you are willing to add this as the first one, it'd be nice :)