microsoft / durabletask-mssql

Microsoft SQL storage provider for Durable Functions and the Durable Task Framework
MIT License
87 stars 32 forks source link

GetScaleRecommendation query issue #28

Closed samfisher07 closed 3 years ago

samfisher07 commented 3 years ago

Hi @cgillum

I am currently using the mssql scaler with KEDA and my durable function application by following the process described here: https://keda.sh/docs/2.0/migration/ https://microsoft.github.io/durabletask-mssql/#/

Everything works as expected except the dt.GetScaleRecommendation scaler function. When there are no events/requests it returns a value of 0. This causes the number of pods in the cluster to scale down to 0 which in turn makes the cluster unavailable when hitting the external IP as there are no instances of the durable function running to process the request.

To remedy this I added a line before the return statement in the script: IF (@recommendedWorkersForOrchestrations + @recommendedWorkersForActivities) < 1 SET @recommendedWorkersForOrchestrations = 1

Which simply ensures that a minimum of 1 is returned by the scaler function. I'm not sure if this is best practice but it immediately fixed the issue and the cluster is scaling up and down (to a minimum of 1 pod) perfectly.

I've opened this ticket in response to your comment in: https://github.com/kedacore/keda-external-scaler-azure-durable-functions/issues/18

Please let me know if I should provide any more information and thanks for the great scaler!

Cheers, Sam

cgillum commented 3 years ago

Thanks @samfisher07 for the feedback! Scaling to zero is by-design since theoretically you shouldn't need a pod unless there is work to do. However, I recognize that if you have triggers other than durable triggers running in the same container that this can be problematic. However, when I tested this using the Functions Core tools, it would actually create two function apps: one for HTTP triggers with a minimum count of 1 and one for everything else, with a minimum count of 0, so I didn't encounter this type of issue.

Can you tell me more about the composition of your app and the availability issues you ran into? For example, do you have both HTTP triggers and Durable triggers running in the same app? Also, is configuring the KEDA scaler with a minimum of 1 replica (instead of 0) an alternative workaround that would work for you?

One option I could consider for dt.GetScaleRecommendation is to add another parameter for a minimum instance count, but I want to get a bit more information to understand how useful this would be (or not).

samfisher07 commented 3 years ago

Sure @cgillum .

My durable function application using a HTTP trigger via the 'Httpstart' function (in similar fashion to the durable function tutorials) that initiates a durable function orchestration. I'm actually unfamiliar with a durable trigger.

The option of adding a minimum replicas parameter is an excellent idea since it resolves this 0 instances issue but also allows anyone to configure the minimum availability of their cluster. For instance I would actually prefer to have minimum replicas be 3 as availability is of more importance to me than minimum cost. I'm sure adding this would be useful to many.

cgillum commented 3 years ago

@samfisher07 are you able to work around this by configuring KEDA with a minimum replica count? For example, if you're using the Azure Functions Core Tools, you can use the ---min-replicas command-line parameter to specify a non-zero minimum. Something like this:

func kubernetes deploy (...args...) --min-replicas 3 --max-replicas 10
samfisher07 commented 3 years ago

@cgillum I'm a bit of a novice with the Kubernetes deployments but if I'm not mistaken these min-replicas and max-replicas values are set in the horizontal pod autoscaler?

This is set on my side to a non-zero minimum which is actually how I discovered the problem. I got a warning from the horizontal pod scaler description (kubectl describe hpa) that said the scaled object requested a replica count below the minimum set. Thereafter, the scaler set the instance count to 0 and didn't scale up again.

I do still think that adding a minimum replicas parameter to the scaler function would be best solution, although a manual edit of the scaler function worked perfectly well for me.

cgillum commented 3 years ago

The minimum count is actually supposed to be configured via the minReplicaCount property of the ScaledObject spec. According to the KEDA docs, if minReplicaCount is not configured, the default value is 0. I haven't tried to see what happens when the ScaledObject and the HPA disagree on minimum instance counts, but based on your experience it sounds like the ScaledObject configuration wins.

My main hesitation to making a change to GetScaleRecommendation is that it's likely going to require a breaking change. I also hesitate because this is just a "recommendation" and I feel it's the responsibility of the scaling infrastructure to decide whether or how to apply the recommendation.