knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.5k stars 1.14k forks source link

Revise how activators are assigned to services #14634

Closed skonto closed 1 month ago

skonto commented 9 months ago

Describe the feature

The kpa reconciles the sks and computes/sets the number of activators needed to cover the capacity in proxy mode. Then the sks reconciler sets the public service endpoints to point to a subset of the activators according to capacity needs. The latter was introduced here. However the subset size is computed as:

    capacityToCover := float64(readyPods) * decider.Spec.TotalValue
...
    return int32(math.Max(minActivators, math.Ceil(capacityToCover/decider.Spec.ActivatorCapacity)))

ActivatorCapacity is globaly set to 100. This value seems arbitrary and does not follow the capacity of an activator instance that depends on its resources. That config option could make sense in case of QoS support within activator and per service, assuming we know how many requests an activator instance can handle (also requests are not equal and may differ per application). That QoS concept does not exist. Given that we have activator hpa based on cpu and we will add mem hpa #13843, we could have real automated, scaling out of activator instances (another idea would be to autoscale it via requests, not sure if it makes sense though, then activator's capacity would be interesting).

Given the latter and envoy capabilities (see bellow) I think we should simplify the activator assignment and revise subsetting (do we need it or at least provide some opt-in when users dont care about it?). Moreover interesting, missing features like locality awareness https://github.com/knative/serving/issues/7046 could be implemented with K8s primitives (taints) by placing pods accordingly I suspect. For general topology awareness cases like #13581 and #14633 I think we could mitigate the concern of imbalanced requests by utilizing envoy's zone awareness routing at the ingress side (target specific activators only per zone) along with the topology aware routing at the ksvc private service side (see an example here of the concept with Istiod). The latter private svc is used by the activator's pod tracking mechanism but there is a period where direct pod addressing is used which might need some topology hint awareness or not. I think if we keep everything balanced end-to-end statistics from pods should not be affected.

cc @dprotaso @ReToCode @kahirokunn @Bryce-huang @psschwei @nak3

/area networking /area API

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.