Closed eugen-nw closed 1 year ago
Hi
I believe that the problem could be related with the short pollingInterval and the pod statuses. As KEDA is checking it every second, maybe pods aren't in a running state and KEDA thinks that there are missing jobs.
You can try increasing the pollingInterval or setting more states in pendingPodConditions
pollingInterval should have no relationship to the count of Pods that are already running. If I have 2 Pods already running and 5 Messages in the Queue, then I need the scale-out to fire up only 3 new Pods.
Could you enable the debug logs and share them? The operator logs in debug expose the queue length and the current job count
Please instruct on how should I go about enabling the debug logs. More than gladly to do so.
Thank you, Eugen
Diese Nachricht wurde von meinem iPhone gesendet.
Am 5/24/23 um 2:58 AM schrieb Jorge Turrado Ferrero @.***>:
Could you enable the debug logs and share them? The operator logs in debug expose the queue length and the current job count
— Reply to this email directly, view it on GitHubhttps://github.com/kedacore/keda/issues/4554#issuecomment-1560816073, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADT64RVVP7TYNZR5GOGP37TXHXLULANCNFSM6AAAAAAYIKM35Q. You are receiving this because you authored the thread.Message ID: @.***>
I have the bandwidth now to address this issue. What would you like me to do precisely, perhaps the steps below?
The behavior I'd expect to have is that if I already have a Job running and I send a Message into the Queue, there won't be a second Job starting up but have the currently running Job handling that one Message.
3. Send a Message in the Queue and verify that there are 2 Pods running instead of one.
I think that this shouldn't happen.
The behavior I'd expect to have is that if I already have a Job running and I send a Message into the Queue, there won't be a second Job starting up but have the currently running Job handling that one Message.
This is exactly the behavior I'd expect. Isn't this happening?
@JorTurFer No it does not happen. If have on Pod running - as per the minReplicaCount setting - and then I send a Message I see the second Pod starting up.
I've tried it as well with minReplicatCount set to 2 and sending 5 Messages. The end result was that I got 7 Pods running whereas only 5 would have been sufficient to process the 5 Messages.
@zroubalik , @tomkerkhove , Is this behavior intended and I'm missing something or is this a bug? I have checked the e2e tests and it's coverting this scenario
I thought about this a bit more and it may rather be a feature than a bug. Let's say that I configure a ScaledJob to have a minReplicaCount of 4. By this I express my desire to always have 4 Jobs on stand-by, ready to receive Messages. 2 Messages pop up, so two of my initial 4 Jobs are busy processing them, and by doing so those two are no longer available. In response to that, the ScaledJob starts up two new Jobs immediately, in order to ensure that 4 Jobs will be available soon.
Does this reasoning sounds right to you guys?
Does this reasoning sounds right to you guys?
I thought so, that's why I asked other teammates because that's the behavior covered by the e2e tests. Maybe it's just a documentation gap, but I'm not sure
Thank you. Let's see what response we'll receive.
However, since there are tests that test the behavior, it may be safe to update the documentation. And the behavior is indeed present, I'd tested it several times in the past two weeks and it does work very well :-))
If you set minReplicas for ScaledJob, then it is basically a minimum number of jobs (a base) anything else should trigger more jobs. see the PR: https://github.com/kedacore/keda/issues/3426
Thanks very much @zroubalik!
Would it be possible to enhance the documentation of minReplicaCount at https://keda.sh/docs/2.9/concepts/scaling-jobs/ to explain the scale-out behavior dictated by the minReplicaCount parameter? In the current state of the documentation it explains only the fact that minReplicaCount Jobs will be created by default.
Would it be possible to enhance the documentation of minReplicaCount at keda.sh/docs/2.9/concepts/scaling-jobs to explain the scale-out behavior dictated by the minReplicaCount parameter?
It'd be amazing because it's true that it could be a bit confusing. Would you open a PR in docs with the change?
I'll give it a try. My first open source contribution...
I'll give it a try. My first open source contribution...
It's never too late to start 😄 Just fork the docs repo, create a new branch and add the information and submit the PR. You might take some info or diagrams from the PR/issue I linked. If you find that useful. Thanks 🙏
@JorTurFer @zroubalik we were just reading the docs kindly added by @eugen-nw and this really confused me. I can understand that someone may want this behaviour, but it feels like the expected behaviour here:
Let's say that I configure KEDA to have 2 Jobs running permanently. If I send 5 Messages to the Queue, I'd expect KEDA to create only 3 new Pods Instead it is creating 5 new Pods, so they match the count of Messages in the Queue.
is going to be a more common use case, or at least desired by some users.
Scaling out too much will cost us a considerable amount of money as we're processing videos on GPU Nodes.
You can limit the max. desired / allowed count of containers in the .yaml script. That will limit your expenses. In your example you will get 5 Jobs created to handle your 5 Messages + 2 other Jobs on stand-by to handle whatever may come in. All of these when the 5 new Pods are up and functional.
My scale-out scenario has to accommodate sudden bursts in demand. The current operation mode enables me to have N containers (more or less) ready to immediately handle a burst.
You can limit the max. desired / allowed count of containers in the .yaml script.
No matter what we set the max to we're always going to be spinning up containers for no reason. If two items come into our queue we don't need to spin up two additional Jobs with their own GPU Nodes and pay the minimum charge for that when we have two Jobs ready for them. If we set the maximum to the same as the minimum this wouldn't happen but we also would not be autoscaling.
My scale-out scenario has to accommodate sudden bursts in demand. The current operation mode enables me to have N containers (more or less) ready to immediately handle a burst.
I understand that this is a desirable use case for you and some others, but I doubt it's what most people would think the behaviour is when they see this parameter (which is why this issue was created).
Hi @LewisJackson1 So, you would like to have minReplicaCount always (let's say 2 for example), but in case of having jobs you want that one of those 2 is who manages the job, not having extra instances ready for working, right? In that case, you want pre-warmed instances for the first jobs, but for next jobs, is waiting acceptable for them? I mean, you already have some ready pods to process those jobs when there isn't any pending job. Probably I'm missing something important in the middle because I don't get your use case :(
If waiting is not a problem and you prefer to save as much money as possible, you can set minReplicaCount: 0
(or just not set anything) and you will have 0 pending jobs
So, you would like to have minReplicaCount always (let's say 2 for example), but in case of having jobs you want that one of those 2 is who manages the job, not having extra instances ready for working, right?
Hello @JorTurFer, I'm not sure that I understand the question here, apologies!
In that case, you want pre-warmed instances for the first jobs, but for next jobs, is waiting acceptable for them? I mean, you already have some ready pods to process those jobs when there isn't any pending job.
Yeah, if additional jobs came in after the minimum replicas then they would have to wait for scaling and that's acceptable.
I guess the simplest way that I can think of to illustrate this is to compare the behaviour to a ScaledObject. If we configure a ScaledObject to track an SQS queue with 2 minimum replicas and 2 items enter the queue, the ScaledObject does not spin up 2 more Pods - is that correct?
We're looking at migrating a queue processor from ScaledObject to ScaledJob and I'm just finding this inconsistency between the two defined behaviours quite weird. I think that we could work around this with a static Deployment that would always be warm, then set the ScaledJob to track additional queue items?
We're looking at migrating a queue processor from ScaledObject to ScaledJob and I'm just finding this inconsistency between the two defined behaviours quite weird.
Yes, you are right and they aren't consistent, but they aren't comparable either IMHO. I mean, in ScaledObject, the workload can process multiple items, so just after finishing with a message, the workload starts with the next message without any cooldown. In ScaledJob, your job usually takes 1 single message and ends, so after finishing the current message, the pod finishes and KEDA spin up another job, which isn't instant. That's why the minimum replicas for ScaledJob is the minimum replicas ready to work (idle).
This is an interesting discussion, and maybe the best place is in a GH discussion, where other maintainers and any other community folk can give their 2 cents. Would you open a discussion about this?
In any case, for solving your use case, you could create your REST API (or gRPC Server) with the business logic that you want, and use Metrics API Scaler (or External Scaler) to connect KEDA to it. With this approach, you could set minReplicaCount:0
and provide from your server the desired amount of instances on each moment.
You may want to give the Job scale-out method some time to settle. Spend some time experimenting with both scale-out alternatives. Use Linux containers (vs. Windows) for faster Pod start-up times. Jobs will always handle totally long processings should that be a concern. With ScaledObject scale-out you'll pay for unused capacity. The best scenario is to have no Pods running 24 x 7 and use ScaledJob to fire up Pods whenever necessary; should that setup accommodate your use cases.
I operate in Azure cloud. Taking scale-out to the next level, I run no Pods in the Azure Kubernetes cluster but delegate them to run in the Azure Container Instances service by using a Virtual Kubelet. Thus we pay only for each second a Pod runs + we can scale out indefinitely.
This is an interesting discussion, and maybe the best place is in a GH discussion, where other maintainers and any other community folk can give their 2 cents. Would you open a discussion about this?
I've opened a discussion here: https://github.com/kedacore/keda/discussions/4885
In ScaledJob, your job usually takes 1 single message and ends, so after finishing the current message, the pod finishes and KEDA spin up another job, which isn't instant. That's why the minimum replicas for ScaledJob is the minimum replicas ready to work (idle).
I feel like it is quite an opinionated stance for the scaler to take to assume that the user would want to have a buffer here as their Jobs are slow to start-up/terminate. I don't think there's that much difference between a Job and a persistent Pod, they both have a start-up latency so the over-provisioning behaviour could also be useful there. I can understand that this might be desirable for some people, and it'd be great to have this behaviour available for both ScaledJob and ScaledObject as an opt-in/out.
Report
Say that I configure KEDA with minReplicaCount > 0. If I send Messages to the Queue, that causes KEDA to create as many new Pods as how many Messages there are in the Queue, with no regard to the count of Jobs that are always running, i.e. those created by the minReplicaCount > 0,
Expected Behavior
Let's say that I configure KEDA to have 2 Jobs running permanently. If I send 5 Messages to the Queue, I'd expect KEDA to create only 3 new Pods Instead it is creating 5 new Pods, so they match the count of Messages in the Queue. Below is the scaling behavior that the documentation at https://keda.sh/docs/2.9/concepts/scaling-jobs/ states.
Actual Behavior
Please see above.
Steps to Reproduce the Problem
Configure a KEDA Job deployment in a manner similar to the script below.
Deploy the script and check the count of Pods created. Should be 2.
Send N Messages into the Queue.
Check the count of Pods created. It will be N + 2.
Logs from KEDA operator
Please email edaroczy@boldiq.com for the .ZIP file.
KEDA Version
2.10.1
Kubernetes Version
1.25
Platform
Microsoft Azure
Scaler Details
Azure Service Bus
Anything else?
AKS 1.25.6 KEDA 2.10.2 The Containers run on the virtual-node-aci-linux virtual node.