Slowly ramp up traffic to new backend-ips

otrosien commented 5 years ago

Is your feature request related to a problem? Please describe.

A service that scales up because of load might have a significant amount of RPS per backend-ip. If a new pod reports ready it will directly get its full share of RPS. This can overload services that don't pre-warm fully. It also requires each service to properly implement its pre-warming properly. Having a slow ramp-up of traffic will make it easier for services to scale.

Describe the solution you would like

I would imagine the round-robin table to add a weight per backend-ip, by default it's equal. If a new backend-ip joins, it will get a lower weight (probability), and time-based the weight increases until equal to the rest.

Describe alternatives you've considered (optional)

Alternatives would be creating a new stack and gradually shifting traffic to it. But the downside is of course manual effort, and with every scaling step a full shift between two stacks.

Additional context (optional)

AWS ELBs call this "slow_start.duration_seconds".

Would you like to work on it?

No

mikkeloscar commented 5 years ago

This feature was also expressed by other users (Team in @Zalando).

szuecs commented 5 years ago

@mikkeloscar in general sounds good, but I can also imagine that it might be dangerous if you take too long time and you have a rolling update with not so many endpoints ongoing.

szuecs commented 4 years ago

For implementation see also https://github.com/zalando/skipper/issues/1249#issuecomment-558633258

aryszka commented 4 years ago

This design is intended to describe the approach supporting gradually increasing the traffic sent to new backend endpoints.

Problem:

When using the load balanced backends, and the backend service is scaling out horizontally, new backend endpoints are added, and once Skipper detects the new endpoint, it immediately starts sending the proportional traffic rate to it. In certain cases, the backend endpoints can tolerate the proportional traffic only after a "warm-up" period. Example:

a backend is running with 2 endpoints and handling 1000rps total traffic, and each endpoint handling 500rps.
the load increases by 200rps, and the backend service needs to scale out
Skipper detects the new endpoint, and immediately starts sending the proportional traffic to each backend: 400rps
this can cause problems for the instance if it requires a warm-up period, and failure in serving 33% of the traffic can also cause significant amount of retried requests if the clients are programmed this way, making the problem even more severe

Proposed solution:

Skipper would gradually, "fade-in" style, increase the traffic to new endpoints, starting from zero, with a configurable duration, after which the traffic to the endpoint would reach the current proportional rate.

The LB backend would be extended with the following configuration fields:

fade-in duration: the duration how long the fade
fade-in degree: the polynomial degree of the fade-in function. E.g. if the value is 1, then the fade-in is linear, if the value is 2, then the fade-in is quadratic.

Skipper would put a timestamp of detection on each backend internally. Notes on the detection time:

even if it is possible to fetch the creation time of the endpoint, e.g. from the Kubernetes API, it still should be the time when Skipper first detected it, because that's when the fade-in process starts, and the time difference can be too big between the two
in extreme cases, storing this data can cause not insignificant memory usage, but we are not aware of such cases currently so initially this data can be best stored together with the generated routes. If required, we can consider storing it in the supporting Redis storage
the best place to create the timestamp is a dedicated routing.Postprocessor

Using these fields, the mechanism would work as follows:

A postprocessor, when detects a new endpoint sets a detection timestamp for the endpoint
The proxy, after an endpoint is selected by the LB algorithm, and if the endpoint's age is less than the fade-in duration, calculates the chance for using the selected endpoint for the current request based on the fade-in duration, fade-in degree and the endpoint's age. If it should not be used for the current request, then it sends the request to another endpoint selected by the LB algorithm

The function of the fade-in can be calculated as follows:

desired_rate = (proportional_rate / fade_in_duration) * (now - endpoint_detected) ^ fade_in_degree

In order to get the chance for using the endpoint, that can be used with the random function, we can transform this as follows:

chance = desired_rate / proportional_rate = (1 / fade_in_duration) * (now - endpoint_detected) ^ fade_in_degree

Note that it is not necessary to use the request rate metric as an input, the current time is enough.

Questions:

how should we define what a new endpoint is? An IP that we didn't know of before?
what happens when the backend instance represented by an endpoint gets restarted for some reason?

AlexanderYastrebov commented 4 years ago

Why not take detection time directly from endpointCreated (if present) contrary to: https://github.com/zalando/skipper/blob/d21aec1d8032185fbc632ba4e8369f0668ebd6d0/filters/fadein/fadein.go#L236-L240

If endpointCreated is designed to be a metadata provided by infrastructure, why can't we trust infrastructure to know when endpoint (e.g. pod) went online? Also consider the case of a new skipper instance upon scaling: it should route traffic to already established endpoints without any fade-in. Current implementation prevents that and considers them all new regardless of endpointCreated data.

AlexanderYastrebov commented 4 years ago

how should we define what a new endpoint is? An IP that we didn't know of before?

I was thinking about EndpointMetrics - a service that would measure endpoint in-flight requests and latencies across all routes and could be useful datasource for load balancing algorithms. The main issue I stuck on is endpoint identity. It is clear that ip address is not enough as they can be reused. I think in k8s context skipper can and should leverage infrastructure to provide endpoint identity, something like tuples of (ip, endpointCreated) or even (ip, pod (i.e. endpoint targetRef))

szuecs commented 3 years ago

Do we miss something to close this issue? For me it feels like fadeIn() is implementing the supposed feature request, but maybe we can have a follow up issue that makes clear what we want to achieve as next step.

What do you think @otrosien @aryszka @AlexanderYastrebov?

otrosien commented 2 years ago

Happy with fadeIn (and if I'm not mistaken it now also works in combination with powerOfRandomNChoices) and happy to close this. I don't see any documentation note on the potential issue of a rolling-restart of a deployment vs. the fadeIn filter, causing a skew in traffic allocation. (hint: set spec.minReadySeconds to ensure we don't roll too quickly). Is this available, but I didn't find it?

szuecs commented 2 years ago

@otrosien I am not sure if you have not the most knowledge about the issue, so maybe you could propose something for the docs as PR? The source is https://github.com/zalando/skipper/blob/master/docs/reference/filters.md#fadein

otrosien commented 2 years ago

Opened a PR

zalando / skipper

Slowly ramp up traffic to new backend-ips #1207