kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.27k stars 1.05k forks source link

ScaledObject still gets applied after failing validation due to using a fallback with a CPU trigger #5954

Open vzdai opened 2 months ago

vzdai commented 2 months ago

Report

When applying a ScaledObject which has both fallback set and is also using a CPU (or Memory?) trigger, the admission webhook emits a validation error but the update still gets applied to the ScaledObject.

Expected Behavior

Since there was a validation error, a ScaledObject that contains both a fallback spec and a CPU trigger should not be created.

Actual Behavior

A ScaledObject containing a fallback spec as well as a CPU trigger was created.

Steps to Reproduce the Problem

  1. Set failurePolicy: Fail for the admission webhook (not sure if this is required)
  2. Create and apply a ScaledObject with a fallback and also a CPU trigger

Example ScaledObject spec:

spec:
  fallback:
    failureThreshold: 5
    replicas: 5
  maxReplicaCount: 5
  minReplicaCount: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vdai
  triggers:
    - metadata:
        container: main
        value: '80'
      metricType: Utilization
      type: cpu

Logs from KEDA operator

Admission webhook:

{"level":"error","ts":"2024-07-08T07:57:41Z","logger":"scaledobject-validation-webhook","msg":"validation error","name":"vdai","error":"type is cpu , but fallback it is not supported by the CPU & memory scalers","stacktrace":"github.com/kedacore/keda/v2/apis/keda/v1alpha1.verifyFallback\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:175\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.validateWorkload\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:142\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObject).ValidateUpdate\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:112\ngithub.com/kedacore/keda/v2/apis/keda/v1alpha1.ScaledObjectCustomValidator.ValidateUpdate\n\t/workspace/apis/keda/v1alpha1/scaledobject_webhook.go:82\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/validator_custom.go:101\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2136\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2136\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/workspace/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2136\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2514\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}

Operator:

{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Reconciling ScaledObject","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"72fb6907-b862-438d-8275-3ebec04e7034"}
{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Detected resource targeted for scaling","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"72fb6907-b862-438d-8275-3ebec04e7034","resource":"apps/v1.Deployment","name":"vdai"}
{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Updated HPA according to ScaledObject","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"72fb6907-b862-438d-8275-3ebec04e7034","HPA.Namespace":"vdai","HPA.Name":"keda-hpa-vdai"}
{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Initializing Scaling logic according to ScaledObject Specification","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"72fb6907-b862-438d-8275-3ebec04e7034"}
{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Reconciling ScaledObject","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"1b9ee1bc-e397-4380-839b-d6f69ae84551"}
{"level":"info","ts":"2024-07-08T07:57:41Z","msg":"Detected resource targeted for scaling","controller":"scaledobject","controllerGroup":"keda.sh","controllerKind":"ScaledObject","ScaledObject":{"name":"vdai","namespace":"vdai"},"namespace":"vdai","name":"vdai","reconcileID":"1b9ee1bc-e397-4380-839b-d6f69ae84551","resource":"apps/v1.Deployment","name":"vdai"}

KEDA Version

2.14.0

Kubernetes Version

1.28

Platform

Amazon Web Services

Scaler Details

CPU

Anything else?

I think the bug was introduced here, where the admission webhook is logging an error but not bubbling it up due to a missing return err, so the rest of the ScaledObject create/update flow still goes through.

It looks like there may also be a similar issue when validating the replica count (code). I'm not sure why the unit tests are passing though.

zroubalik commented 2 months ago

@vzdai thanks for reporting, are you willing to contribute a fix? Thanks!

vzdai commented 1 month ago

@zroubalik Sure, I can contribute a fix.