Closed otrosien closed 2 years ago
This feature was also expressed by other users (Team in @Zalando).
@mikkeloscar in general sounds good, but I can also imagine that it might be dangerous if you take too long time and you have a rolling update with not so many endpoints ongoing.
For implementation see also https://github.com/zalando/skipper/issues/1249#issuecomment-558633258
This design is intended to describe the approach supporting gradually increasing the traffic sent to new backend endpoints.
Problem:
When using the load balanced backends, and the backend service is scaling out horizontally, new backend endpoints are added, and once Skipper detects the new endpoint, it immediately starts sending the proportional traffic rate to it. In certain cases, the backend endpoints can tolerate the proportional traffic only after a "warm-up" period. Example:
Proposed solution:
Skipper would gradually, "fade-in" style, increase the traffic to new endpoints, starting from zero, with a configurable duration, after which the traffic to the endpoint would reach the current proportional rate.
The LB backend would be extended with the following configuration fields:
Skipper would put a timestamp of detection on each backend internally. Notes on the detection time:
Using these fields, the mechanism would work as follows:
The function of the fade-in can be calculated as follows:
desired_rate = (proportional_rate / fade_in_duration) * (now - endpoint_detected) ^ fade_in_degree
In order to get the chance for using the endpoint, that can be used with the random function, we can transform this as follows:
chance = desired_rate / proportional_rate = (1 / fade_in_duration) * (now - endpoint_detected) ^ fade_in_degree
Note that it is not necessary to use the request rate metric as an input, the current time is enough.
Questions:
Why not take detection time directly from endpointCreated
(if present) contrary to:
https://github.com/zalando/skipper/blob/d21aec1d8032185fbc632ba4e8369f0668ebd6d0/filters/fadein/fadein.go#L236-L240
If endpointCreated
is designed to be a metadata provided by infrastructure, why can't we trust infrastructure to know when endpoint (e.g. pod) went online?
Also consider the case of a new skipper instance upon scaling: it should route traffic to already established endpoints without any fade-in. Current implementation prevents that and considers them all new regardless of endpointCreated
data.
how should we define what a new endpoint is? An IP that we didn't know of before?
I was thinking about EndpointMetrics
- a service that would measure endpoint in-flight requests and latencies across all routes and could be useful datasource for load balancing algorithms. The main issue I stuck on is endpoint identity. It is clear that ip address is not enough as they can be reused. I think in k8s context skipper can and should leverage infrastructure to provide endpoint identity, something like tuples of (ip, endpointCreated)
or even (ip, pod (i.e. endpoint targetRef))
Do we miss something to close this issue? For me it feels like fadeIn() is implementing the supposed feature request, but maybe we can have a follow up issue that makes clear what we want to achieve as next step.
What do you think @otrosien @aryszka @AlexanderYastrebov?
Happy with fadeIn (and if I'm not mistaken it now also works in combination with powerOfRandomNChoices) and happy to close this. I don't see any documentation note on the potential issue of a rolling-restart of a deployment vs. the fadeIn filter, causing a skew in traffic allocation. (hint: set spec.minReadySeconds to ensure we don't roll too quickly). Is this available, but I didn't find it?
@otrosien I am not sure if you have not the most knowledge about the issue, so maybe you could propose something for the docs as PR? The source is https://github.com/zalando/skipper/blob/master/docs/reference/filters.md#fadein
Opened a PR
Is your feature request related to a problem? Please describe.
A service that scales up because of load might have a significant amount of RPS per backend-ip. If a new pod reports ready it will directly get its full share of RPS. This can overload services that don't pre-warm fully. It also requires each service to properly implement its pre-warming properly. Having a slow ramp-up of traffic will make it easier for services to scale.
Describe the solution you would like
I would imagine the round-robin table to add a weight per backend-ip, by default it's equal. If a new backend-ip joins, it will get a lower weight (probability), and time-based the weight increases until equal to the rest.
Describe alternatives you've considered (optional)
Alternatives would be creating a new stack and gradually shifting traffic to it. But the downside is of course manual effort, and with every scaling step a full shift between two stacks.
Additional context (optional)
AWS ELBs call this "slow_start.duration_seconds".
Would you like to work on it?
No