kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.32k stars 1.05k forks source link

Unpausing scaledobject broken #5526

Closed jlemaes closed 6 months ago

jlemaes commented 7 months ago

Report

When setting annotation autoscaling.keda.sh/paused-replicas=0 on a scaledobject, pausing works as expected. The scaledobject status has ScaledObjectPaused true and the deployment is downscaled to 0 replicas.

Removing the annotation does not correctly unpause the the scaledobject. The scaledobject status still has:

  - message: Scaling is not performed because triggers are not active
    reason: ScalerNotActive
    status: "False"
    type: Active
...
  - message: ScaledObject is paused
    reason: ScaledObjectPaused
    status: "True"
    type: Paused

I can manually scale the deployment back up, which keda does not overwrite again.

Expected Behavior

When removing the annotation autoscaling.keda.sh/paused-replicas I expect the keda to set the replicas of the deployment back to minreplicas so that the hpa can start doing it's function.

Actual Behavior

The scaledobject stays paused=true and active=false in the status, the deployment does not scale to the minreplicas.

Steps to Reproduce the Problem

  1. Pause a scaledobject with annotation autoscaling.keda.sh/paused-replicas=0
  2. Wait until all pods are gone
  3. Unpause the scaledobject by removing the annotation

Logs from KEDA operator

These logs show up after unpausing:

"error":"could not find stackdriver metric with query fetch pubsub_subscription | metric 'pubsub.googleapis.com/subscription/num_undelivered_messages' | filter (resource.project_id == '<project id>' && resource.subscription_id == '<sub id>') | within 1m", "level":"error", "logger":"gcp_pub_sub_scaler", "metricType":"pubsub.googleapis.com/subscription/num_undelivered_messages", "msg":"error getting metric", "name":"updater", "namespace":"default", "stacktrace":"github.com/kedacore/keda/v2/pkg/scalers.(*pubsubScaler).GetMetricsAndActivity
    /workspace/pkg/scalers/gcp_pubsub_scaler.go:193
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
    /workspace/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalerState
    /workspace/pkg/scaling/scale_handler.go:743
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState.func1
    /workspace/pkg/scaling/scale_handler.go:628", "ts":"2024-02-22T19:44:46Z", "type":"ScaledObject"}

KEDA Version

2.13.0

Kubernetes Version

1.27

Platform

Google Cloud

Scaler Details

gcp pubsub scaler+cpu

Anything else?

We noticed this behaviour since keda 2.12. It worked in keda 2.11

Restarting the keda operator after unpausing sets the paused=false status correctly, but keep the active=false status.

jlemaes commented 7 months ago

I noticed that the same issue also occurs when not pausing but when only using the pubsub scaling(not cpu+pubsub). I see related fixes on master so I will wait until that is released and see if pausing then also works. It's strange that those error logs only start after pausing.

JorTurFer commented 7 months ago

Hello! I think that you can be facing with the bug solved by this: https://github.com/kedacore/keda/pull/5452

jlemaes commented 6 months ago

It's indeed fixed by using release v2.13.1

JorTurFer commented 6 months ago

nice!