knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.46k stars 1.14k forks source link

Concurrecy Setting : Hard limit and Soft limit #15216

Open vividcloudpark opened 1 month ago

vividcloudpark commented 1 month ago

Ask your question here:

I'm referencing two documents. The one is Official docs : https://knative.dev/docs/serving/autoscaling/concurrency/#soft-versus-hard-concurrency-limits

and the other one is Configmap's commnet https://github.com/knative/serving/blob/main/config/core/configmaps/autoscaler.yaml

I'm confusing when it comes to use both Hard limit and Soft limit.

Official docs says it will follow smaller one, If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.

but on the configmaps's comment, it said # When revision explicitly specifies container concurrency, that value will be used as a scaling target for autoscaler

which one is correct?

skonto commented 1 month ago

Hi @vividcloudpark, the configmap part refers to the case where user overrides target concurrency via an annotation. If there is no annotation then the container-concurrency-target-default is used. Here is the related code that shows the logic for this and the hard limit (containerConcurrency): https://github.com/knative/serving/blob/main/pkg/reconciler/autoscaling/resources/target.go#L40-L64. The hard limit (containerConcurrency) still applies even if the user overrides stuff: total = math.Min(annotationTarget, float64(pa.Spec.ContainerConcurrency)) Hope that helps.