Closed lvjing2 closed 5 years ago
@lvjing2 you're right. Ideally we should get stats from every pods to get the accurate value. There're two reasons we couldn't do this:
To improve the situation, we can sample more each time. Currently it is hard coded 3 samples. We can dynamically calculate the sample size based on the active pods count. WDYT?
In the same vein, we could make the scraper do its probes in a single scrape cycle. For example, calling Scrape
once could result in N probes, which we'd take the average from before extrapolating via the current pod count. That'd make for a smoother graph than what we have today.
Today, we essentially take that average in the autoscaler. There is a chance however, that the autoscaler observes an unrepresentative peak, if it runs in the middle of a scrape cycle.
@yanweiguo I found that you already submit #3831 , and I think that's nice currently. But I think we still need to consider two things:
- The number of scrape probes will increase as the pod number grows, it will turn to be a performance upper bound for autoscaler, then the HPA of autoscaler #2930 is needed.
Based on the Formula listed here, given population variance of 100, the max sample size is 16 per second. We may need to tune the value of population variance. I'm fine with 16 QPS.
We want to do #2930 anyway.
100 means 100 pods per revision, or 100 active revisions? if there are thousands of active revisions with tens of pods, I think 16QPS totally is not enough. Yeah, anyway, #2930 is good, and I think it need to take this into consideration to set the upper limitation number of revisioms for each sharding.
100 means 100 pods per revision, or 100 active revisions? if there are thousands of active revisions with tens of pods, I think 16QPS totally is not enough. Yeah, anyway, #2930 is good, and I think it need to take this into consideration to set the upper limitation number of revisioms for each sharding.
100 means the concurrency variance of all pods per revision.
In what area(s)?
/kind autoscaler
What version of Knative?
HEAD
Expected Behavior
The concurrency metric scraped should always be right enough to decide scale up/down.
Actual Behavior
https://github.com/knative/serving/blob/1a61e74305a9538b2ecc6663dc004b4e65c949c1/pkg/autoscaler/stats_scraper.go#L140-L153 I noticed that
Assume traffic is route to pods evenly.
in the comment, but I think this implementation may introduce new potential bugs especially when scaling up. Specifically when scaling up, the already existed pods usually handling mass of requests, and there should be a interval for the new created pods to take over requests, and rebalance the load with the existed pods. So there are possibly get the lower concurrency than the reality, the autoscaler may start to scale down then. @yanweiguo WDYT