(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.
FYI: We did achieve streaming of batch jobs on AWS SageMaker where we can create N number of jobs in parallel even if existing jobs are in progress.
Actual Behavior
Scenarios:
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job.
We tried addressing the above concern with the below settings.
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 5 jobs and processes 1 job but no use of other jobs/pods. It would be expensive in terms of cost because we are launching 4 GPU pods/jobs unnecessarily. After all, there is only one message in a queue.
2 messages in the queue → Keda triggers 5 jobs and processes 2 jobs but no use of other pods.
Steps to Reproduce the Problem
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it.
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 5 jobs and processes 1 job.
2 messages in the queue → Keda triggers 5 pods and processes 2 pods.
Report
We are running the Generative AI workloads (GPU resources) using Keda-scaled jobs. We are not able to achieve streaming of Keda-scaled jobs.
Expected Behavior
Scenarios:
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.
FYI: We did achieve streaming of batch jobs on AWS SageMaker where we can create N number of jobs in parallel even if existing jobs are in progress.
Actual Behavior
Scenarios:
We tried addressing the above concern with the below settings.
Steps to Reproduce the Problem
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 1 job/pod and processes it.
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 5 jobs and processes 1 job. 2 messages in the queue → Keda triggers 5 pods and processes 2 pods.
Logs from KEDA operator
No response
KEDA Version
2.14.0
Kubernetes Version
1.29
Platform
Amazon Web Services
Scaler Details
AWS SQS Queue
Anything else?
No response