kedacore / keda

KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes
https://keda.sh
Apache License 2.0
8.01k stars 1.01k forks source link

Not able to achieve streaming of Keda scaled jobs #5881

Open vinayak-shanawad opened 2 weeks ago

vinayak-shanawad commented 2 weeks ago

Report

We are running the Generative AI workloads (GPU resources) using Keda-scaled jobs. We are not able to achieve streaming of Keda-scaled jobs.

Expected Behavior

Scenarios:

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1

(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.

FYI: We did achieve streaming of batch jobs on AWS SageMaker where we can create N number of jobs in parallel even if existing jobs are in progress.

Actual Behavior

Scenarios:

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job.

We tried addressing the above concern with the below settings.

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 5 jobs and processes 1 job but no use of other jobs/pods. It would be expensive in terms of cost because we are launching 4 GPU pods/jobs unnecessarily. After all, there is only one message in a queue. 2 messages in the queue → Keda triggers 5 jobs and processes 2 jobs but no use of other pods.

Steps to Reproduce the Problem

  1. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 1 job/pod and processes it.

  2. Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5 (Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios) 1 message in the queue → Keda triggers 5 jobs and processes 1 job. 2 messages in the queue → Keda triggers 5 pods and processes 2 pods.

Logs from KEDA operator

No response

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

AWS SQS Queue

Anything else?

No response

vinayak-shanawad commented 1 week ago

Any updates on this issue?

zroubalik commented 3 days ago

Makes sense, are you willing to contribute a fix?

junekhan commented 20 hours ago

This feature can probably resolve the issue.