knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.53k stars 1.15k forks source link

The concurrency metric scraped from prometheus exporter may not right #3582

Closed lvjing2 closed 5 years ago

lvjing2 commented 5 years ago

In what area(s)?

/kind autoscaler

What version of Knative?

HEAD

Expected Behavior

The concurrency metric scraped should always be right enough to decide scale up/down.

Actual Behavior

https://github.com/knative/serving/blob/1a61e74305a9538b2ecc6663dc004b4e65c949c1/pkg/autoscaler/stats_scraper.go#L140-L153 I noticed that Assume traffic is route to pods evenly. in the comment, but I think this implementation may introduce new potential bugs especially when scaling up. Specifically when scaling up, the already existed pods usually handling mass of requests, and there should be a interval for the new created pods to take over requests, and rebalance the load with the existed pods. So there are possibly get the lower concurrency than the reality, the autoscaler may start to scale down then. @yanweiguo WDYT

yanweiguo commented 5 years ago

@lvjing2 you're right. Ideally we should get stats from every pods to get the accurate value. There're two reasons we couldn't do this:

  1. When Istio sidecar is injected to the pods, we can not scrape the pods directly.
  2. Scraping every pod is expensive when the pods count is large, say, 1000 pods.

To improve the situation, we can sample more each time. Currently it is hard coded 3 samples. We can dynamically calculate the sample size based on the active pods count. WDYT?

markusthoemmes commented 5 years ago

In the same vein, we could make the scraper do its probes in a single scrape cycle. For example, calling Scrape once could result in N probes, which we'd take the average from before extrapolating via the current pod count. That'd make for a smoother graph than what we have today.

Today, we essentially take that average in the autoscaler. There is a chance however, that the autoscaler observes an unrepresentative peak, if it runs in the middle of a scrape cycle.

lvjing2 commented 5 years ago

@yanweiguo I found that you already submit #3831 , and I think that's nice currently. But I think we still need to consider two things:

  1. This may not allow user to customize the load balance policy of services in the future, actually we have no such user case i think.
  2. The number of scrape probes will increase as the pod number grows, it will turn to be a performance upper bound for autoscaler, then the HPA of autoscaler #2930 is needed.
yanweiguo commented 5 years ago
  1. The number of scrape probes will increase as the pod number grows, it will turn to be a performance upper bound for autoscaler, then the HPA of autoscaler #2930 is needed.

Based on the Formula listed here, given population variance of 100, the max sample size is 16 per second. We may need to tune the value of population variance. I'm fine with 16 QPS.

We want to do #2930 anyway.

lvjing2 commented 5 years ago

100 means 100 pods per revision, or 100 active revisions? if there are thousands of active revisions with tens of pods, I think 16QPS totally is not enough. Yeah, anyway, #2930 is good, and I think it need to take this into consideration to set the upper limitation number of revisioms for each sharding.

yanweiguo commented 5 years ago

100 means 100 pods per revision, or 100 active revisions? if there are thousands of active revisions with tens of pods, I think 16QPS totally is not enough. Yeah, anyway, #2930 is good, and I think it need to take this into consideration to set the upper limitation number of revisioms for each sharding.

100 means the concurrency variance of all pods per revision.