kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.01k stars 1.01k forks source link

In response to an empty Message Queue, ScaledJob keeps creating 3 Containers #5895

Closed eugen-nw closed 4 days ago

eugen-nw commented 1 week ago

Report

3 unnecessary Containers are created by the ScaledJob.

Expected Behavior

0 Containers to get created if there are no messages in the Queue

Actual Behavior

I'm seeing 3 Pods running in more or less confused states. That due to a "quota reached" issue that I'm going to fix with Azure support

NAME                                                             READY   STATUS           RESTARTS   AGE   IP       NODE              NOMINATED NODE   READINESS GATES
aks-aci-boldiq-workforce-nje-rostering-development-4xmvq-xww2k   0/1     ProviderFailed   0          94m   <none>   virtual-kubelet   <none>           <none>
aks-aci-boldiq-workforce-nje-rostering-development-fzspl-d28w8   0/1     ProviderFailed   0          88m   <none>   virtual-kubelet   <none>           <none>
aks-aci-boldiq-workforce-nje-rostering-development-lqlkn-mt9wv   1/1     Running          0          92m   <none>   virtual-kubelet   <none>           <none>

Steps to Reproduce the Problem

  1. I'm having the Container in Azure Container Registry.
  2. I'm deploying the ScaledJob using the script below.
    apiVersion: keda.sh/v1alpha1
    kind: ScaledJob
    metadata:
    name: aks-aci-boldiq-workforce-nje-rostering-development
    labels:
    app: aks-aci-boldiq-workforce-nje-rostering-development
    deploymentName: aks-aci-boldiq-workforce-nje-rostering-development
    spec:
    jobTargetRef:
    template:
      spec:
        containers:
        - image: njecontainers.azurecr.io/boldiq-workforce-nje-rostering-development:#{Build.BuildNumber}#
          imagePullPolicy: Always
          name: boldiq-workforce-nje-rostering-development
          resources:
            requests:
              memory: 8G
              cpu: 4
            limits:
              memory: 8G
              cpu: 4
          env:
          - name: KEDA_SERVICEBUS_CONNECTIONSTRING_NJE_ROSTERING_DEVELOPMENT
            value: "Endpoint=sb://nje-development.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=TAQhCms5nbq6E93gzOjx5gk1ChGEzGej4+ASbL9XYso="
        nodeSelector:
          kubernetes.io/os: windows
        tolerations:
        - key: virtual-kubelet.io/provider
          operator: Exists
        - key: azure.com/aci
          effect: NoSchedule
        imagePullSecrets:
          - name: docker-registry-secret
        nodeName: virtual-kubelet
    successfulJobsHistoryLimit: 10
    failedJobsHistoryLimit: 10
    pollingInterval: 1  # 1 second polling for max. responsiveness
    minReplicaCount: 0  # low traffic & long processings => the Containers' 5 minutes' startup is not an issue
    maxReplicaCount: 10
    triggers:
    - type: azure-servicebus
    #    metricType: Value // The default AverageValue with messageCount: '1' starts up a new Container for each Message in the Queue.  We want that for responsiveness.
    metadata:
      queueName: schedulerequests
      connectionFromEnv: KEDA_SERVICEBUS_CONNECTIONSTRING_NJE_ROSTERING_DEVELOPMENT
      messageCount: '1'

3 kubectl get pods show 3 Containers.

Logs from KEDA operator

The output of kubectl logs -n keda keda-operator-dd878ddf6-lgh2n -c keda-operator is incredibly long. Roughly a succession of lines like below:

2024-06-17T23:38:53Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "aks-aci-boldiq-workforce-nje-rostering-development", "scaledJob.Namespace": "default", "Number of running Jobs": 3}
2024-06-17T23:38:53Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "aks-aci-boldiq-workforce-nje-rostering-development", "scaledJob.Namespace": "default", "Number of pending Jobs ": 2}
202

KEDA Version

2.14.0

Kubernetes Version

1.28

Platform

Microsoft Azure

Scaler Details

Service Bus

Anything else?

That's how the ScaledJob is described by AKS:

kubectl describe scaledjob
Name:         aks-aci-boldiq-workforce-nje-rostering-development
Namespace:    default
Labels:       app=aks-aci-boldiq-workforce-nje-rostering-development
              deploymentName=aks-aci-boldiq-workforce-nje-rostering-development
Annotations:  <none>
API Version:  keda.sh/v1alpha1
Kind:         ScaledJob
Metadata:
  Creation Timestamp:  2024-06-17T18:10:30Z
  Finalizers:
    finalizer.keda.sh
  Generation:        2
  Resource Version:  2724122
  UID:               a8f12757-4e33-4c36-925d-1f7b02ac685a
Spec:
  Failed Jobs History Limit:  10
  Job Target Ref:
    Template:
      Metadata:
        Creation Timestamp:  <nil>
      Spec:
        Containers:
          Env:
            Name:             KEDA_SERVICEBUS_CONNECTIONSTRING_NJE_ROSTERING_DEVELOPMENT
            Value:            Endpoint=sb://nje-development.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=TAQhCms5nbq6E93gzOjx5gk1ChGEzGej4+ASbL9XYso=
          Image:              njecontainers.azurecr.io/boldiq-workforce-nje-rostering-development:20240617.2
          Image Pull Policy:  Always
          Name:               boldiq-workforce-nje-rostering-development
          Resources:
            Limits:
              Cpu:     4
              Memory:  8G
            Requests:
              Cpu:     4
              Memory:  8G
        Image Pull Secrets:
          Name:     docker-registry-secret
        Node Name:  virtual-kubelet
        Node Selector:
          kubernetes.io/os:  windows
        Tolerations:
          Key:        virtual-kubelet.io/provider
          Operator:   Exists
          Effect:     NoSchedule
          Key:        azure.com/aci
  Max Replica Count:  10
  Min Replica Count:  0
  Polling Interval:   1
  Rollout:
  Scaling Strategy:
  Successful Jobs History Limit:  10
  Triggers:
    Metadata:
      Connection From Env:  KEDA_SERVICEBUS_CONNECTIONSTRING_NJE_ROSTERING_DEVELOPMENT
      Message Count:        1
      Queue Name:           schedulerequests
    Type:                   azure-servicebus
Status:
  Conditions:
    Message:  ScaledJob is defined correctly and is ready to scaling
    Reason:   ScaledJobReady
    Status:   True
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Status:   Unknown
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
Events:       <none>
eugen-nw commented 4 days ago

This works now. On Friday June 21, 2024: