Closed k4leung4 closed 5 years ago
@k4leung4 Oh yeah, good catch. I completely forgot about that.
There's #2379 to do that specifically. My "plan" was to implement #2977 first, that makes this task way easier to achieve in a "good" way I think.
With #2977 implemented we'd look at the total system concurrency at a given time (vs. over the whole window). The activator metrics would not need special treatment in that model because they simply add to the overall system concurrency (at that point in time).
WDYT? Should we attack #2977 ASAP or should we come up with a bandaid?
Thanks for pointing out #2977, as this is the same problem outlined in that issue. The proposal there makes sense to me.
Do you have time to work on #2977? I can take a stab at it if you are busy.
FYI @yanweiguo, who is working on the metric scraping work.
@k4leung4 I indeed had enough cycles today to throw together #3289.
It does, thanks!
/close
@markusthoemmes: Closing this issue.
In what area(s)?
/area autoscale
What version of Knative?
HEAD
Expected Behavior
When there is significant load on the activator, when it transitions from using activator metrics to pod metrics, the number of desired pod is relatively smooth.
Actual Behavior
I have seen it suggest 1000 desire pods when using activator metrics and drop down to 1 pod when switching over to pod metrics.
Steps to Reproduce the Problem
The moment the autoscaler decides to use pod metrics, it does so when the pod metrics have concurrency greater than epsilon, which leads it to suggest a desired pod count of 1, as there is a lack of metrics from pods at the time. The metrics from the activator is ignored. This leads to having to ramp up time on the metrics to have an accurate desired pod calculation https://github.com/knative/serving/blob/master/pkg/autoscaler/autoscaler.go#L151
From this issue that @mattmoor filed a while back, https://github.com/knative/serving/issues/1623, it suggested we should use, possibly with a discount on the activator metrics when calculating the desired pod count rather than completely dropping the .
@markusthoemmes Do you have any thoughts on how best to address this? Not sure if there is a greater concept on how the KPA should react in these scenarios.
I will experiment with using the activator metrics along with the pod metrics and examine the behavior and see if it is reasonable. I am also looking to add unit test to verify that when pod metrics comes in, the recommendation does not change too much.