i have Keda + sqs + EKS setup
when there is 1st message in sqs queue keda is creating 1st pod
but when there is 2nd message in sqs queue keda is not creating 2nd pod
if i send 3rd message in sqs queue keda is creating pod
activeDeadlineSeconds: 3600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
backoffLimit: 0 # Specifies the number of retries before marking this job failed. Defaults to 6
activeDeadlineSeconds: 16200 #900
template:
metadata:
labels:
app: unified
annotations:
Add toleration for GPU SKU, preventing scheduling on nodes with the specified GPU SKU.
key: "dedicated"
operator: "Equal"
value: "gpupool" #gpupool-apppool
effect: "NoSchedule"
serviceAccountName: s3irsa
terminationGracePeriodSeconds: 600 # time in seconds before terminating the pod gracefully after it receives a completion message
containers:
claimName: uat-efs
pollingInterval: 30 # How often KEDA will check the SQS queue
minReplicaCount: 0 # Minimum number of jobs that KEDA can create
maxReplicaCount: 1 # Maximum number of jobs that KEDA can create
successfulJobsHistoryLimit: 2 # Number of successful jobs to keep
failedJobsHistoryLimit: 2 # Number of failed jobs to keep
scalingStrategy:
strategy: "accurate" #"default" # Scaling strategy (default, custom, or accurate)
pendingPodConditions:
- "Pending"
- "ContainerCreating"
triggers:
type: aws-sqs-queue
metadata:
queueURL: https://sqs.ap-south-1.amazonaws.com/xxxx/xx-unifiedservice.fifo
queueLength: "1"
awsRegion: "ap-south-1"
scaleOnInFlight: "false"
authenticationRef:
name: keda-trigger-auth-aws-credentials # Ensure this references your actual AWS credentials stored in K8s secrets
`
Expected Behavior
after second message in sqs keda should create the 2nd pod
Actual Behavior
i have Keda + sqs + EKS setup
when there is 1st message in sqs queue keda is creating 1st pod
but when there is 2nd message in sqs queue keda is not creating 2nd pod
if i send 3rd message in sqs queue keda is creating pod
Steps to Reproduce the Problem
send 1st message in sqs
check pod is getting created or not
send 2nd message in sqs
check pod should be created.
Logs from KEDA operator
manjur@MacBook-Pro keda % kubectl logs -f keda-operator-7f5d566f89-2fk22
2024/06/21 11:58:45 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
2024-06-21T11:58:45Z INFO setup Starting manager
2024-06-21T11:58:45Z INFO setup KEDA Version: 2.12.1
2024-06-21T11:58:45Z INFO setup Git Commit: dc76ca70f19c22e8f0c806f84d95256d771f3dc9
2024-06-21T11:58:45Z INFO setup Go Version: go1.20.8
2024-06-21T11:58:45Z INFO setup Go OS/Arch: linux/amd64
2024-06-21T11:58:45Z INFO setup Running on Kubernetes 1.28+ {"version": "v1.28.9-eks-036c24b"}
2024-06-21T11:58:45Z INFO starting server {"kind": "health probe", "addr": "[::]:8081"}
I0621 11:58:45.933781 1 leaderelection.go:250] attempting to acquire leader lease keda-uat/operator.keda.sh...
2024-06-21T11:58:45Z INFO controller-runtime.metrics Starting metrics server
2024-06-21T11:58:45Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}
I0621 11:59:23.015266 1 leaderelection.go:260] successfully acquired lease keda-uat/operator.keda.sh
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2024-06-21T11:59:23Z INFO Starting Controller {"controller": "cert-rotator"}
2024-06-21T11:59:23Z INFO cert-rotation starting cert rotator controller
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation certs are ready in /certs
2024-06-21T11:59:23Z INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2024-06-21T11:59:23Z INFO Reconciling ScaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z INFO Starting workers {"controller": "cert-rotator", "worker count": 1}
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z INFO cert-rotation no cert refresh needed
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-06-21T11:59:23Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2024-06-21T11:59:23Z INFO RolloutStrategy: immediate, Deleting jobs owned by the previous version of the scaledJob {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0", "numJobsToDelete": 3}
2024-06-21T11:59:23Z INFO Initializing Scaling logic according to ScaledJob Specification {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "ScaledJob": {"name":"unified-sqs-queue-scaledjob","namespace":"backend"}, "namespace": "backend", "name": "unified-sqs-queue-scaledjob", "reconcileID": "42e024b8-00aa-4f40-8f0a-96959528d2d0"}
2024-06-21T11:59:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 0}
2024-06-21T11:59:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 0}
2024-06-21T11:59:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 1}
2024-06-21T11:59:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:23Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 1}
2024-06-21T11:59:24Z INFO cert-rotation CA certs are injected to webhooks
2024-06-21T11:59:24Z INFO grpc_server Starting Metrics Service gRPC Server {"address": ":9666"}
2024-06-21T11:59:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T11:59:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T11:59:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T11:59:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T11:59:53Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:23Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of running Jobs": 1}
2024-06-21T12:00:53Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of pending Jobs ": 1}
2024-06-21T12:00:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Effective number of max jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
2024-06-21T12:00:53Z INFO scaleexecutor Created jobs {"scaledJob.Name": "unified-sqs-queue-scaledjob", "scaledJob.Namespace": "backend", "Number of jobs": 0}
Report
i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod
`# https://keda.sh/docs/2.13/concepts/scaling-jobs/
apiVersion: v1 kind: Secret metadata: name: keda-sqs-auth namespace: backend type: Opaque
data:
awsRoleArn: "xxxxx
" #echo -n "arn:aws:iam::xxx:role/keda-uat" | base64 AWS_ACCESS_KEY_ID:xxxxx # Required. AWS_SECRET_ACCESS_KEY:xxxxx # Required.
apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-aws-credentials namespace: backend spec: secretTargetRef:
parameter: awsSecretAccessKey # Required. name: keda-sqs-auth # Required. key: AWS_SECRET_ACCESS_KEY # Required.
apiVersion: keda.sh/v1alpha1 kind: ScaledJob metadata: name: unified-sqs-queue-scaledjob namespace: backend spec: jobTargetRef:
parallelism: 2 # max number of desired pods
completions: 1 # desired number of successfully finished pods
activeDeadlineSeconds: 3600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
backoffLimit: 0 # Specifies the number of retries before marking this job failed. Defaults to 6 activeDeadlineSeconds: 16200 #900 template: metadata: labels: app: unified annotations:
Add toleration for GPU SKU, preventing scheduling on nodes with the specified GPU SKU.
spec: restartPolicy: Never # Prevent pods from restarting affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:
podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:
Tolerate nodes with GPU SKU.
requests:
memory: 20000Mi
limits:
cpu: 2500m
memory: 20000Mi
claimName: uat-training
claimName: s3-uatdatabs
claimName: uat-efs pollingInterval: 30 # How often KEDA will check the SQS queue minReplicaCount: 0 # Minimum number of jobs that KEDA can create
maxReplicaCount: 1 # Maximum number of jobs that KEDA can create
successfulJobsHistoryLimit: 2 # Number of successful jobs to keep failedJobsHistoryLimit: 2 # Number of failed jobs to keep
scalingStrategy:
strategy: "accurate" #"default" # Scaling strategy (default, custom, or accurate)
pendingPodConditions:
- "Pending"
- "ContainerCreating"
triggers:
Expected Behavior
after second message in sqs keda should create the 2nd pod
Actual Behavior
i have Keda + sqs + EKS setup when there is 1st message in sqs queue keda is creating 1st pod but when there is 2nd message in sqs queue keda is not creating 2nd pod if i send 3rd message in sqs queue keda is creating pod
Steps to Reproduce the Problem
Logs from KEDA operator
KEDA Version
2.12.1
Kubernetes Version
1.28
Platform
Amazon Web Services
Scaler Details
AWS SQS
Anything else?
No response