Open OlgaMaciaszek opened 5 years ago
The research underlying this approach.
The use cases examined within the whitepaper (linked in the comment above) assume that the clients (in our case, possibly client-side load balancer instances) communicate directly with the servers to choose the best server to pass the request to.
The servers interact with the clients informing them about their current load. The communication between the clients and the servers is structured into rounds. The research includes the following (alternative) approaches:
As the paper demonstrates, these approaches allow getting good results when it comes to load distribution (and resulting max loads vs numbers of requests and servers). However, they are communication-heavy, so the resulting communication costs have to be low enough to allow this approach to be profitable.
With that in mind, the research centres on what is the minimum number of servers (and, thus, smaller communication cost) that each client should communicate that would still give good results in terms of load distribution between the servers. The author proves that while there's a very substantial benefit from contacting two random servers and choosing the best one of them over just randomly picking one server to send requests to, adding any additional number of servers to choose from leads to much less significant improvements.
In the context of implementing this within the Spring Cloud LoadBalancer, there are some issues to consider. The implementation of the ReactiveLoadBalancer
itself should not be problematic, given that we could use the data on the current servers' load stored in the ServiceInstance
metadata, however, the ServiceInstanceListSupplier
implementation and how this information should be collected should probably be discussed.
Some things that come to mind:
While the algorithm centres around the scenario where clients interact directly with end-servers during the load-balancing, the most used ServiceInstanceListSupplier
that we have is the DiscoveryClientServiceInstanceListSupplier
, that interacts with service discovery; we could approach this in various ways:
ServiceInstanceListSupplier
is the caching one - so given the fact that load tends to change quite dynamically we would have stale data; using a reactive push-based ServiceInstanceListSupplier
implementation instead might be a possible solution, but it still may generate unnecessarily heavy traffic if we update it on each load change on each of the servers; also, if we use this approach, we might as well just pick the best server and one of two best servers as we have the list there already;Server instances would have to be instrumented to know their load and have an endpoint where that could be queried (possibly using micrometer?)
We would have a different, interesting scenario with Spring Cloud RSocket - there the Gateway/ LB instances actually will have much more knowledge of the network topology, and we might be able to find best instances based on that
I would like to start a discussion on the best implementation of PoTC. Have put some considerations in the comments above.
Please join the discussion and add any ideas and comments.
@spencergibb @ryanjbaxter @marcingrzejszczak @dsyer @Haybu @smaldini @nebhale @TYsewyn @elandau
2. Server instances would have to be instrumented to know their load and have an endpoint where that could be queried (possibly using micrometer?)
in non-RSocket cases and given that services are instrumented, two things come to mind to help a client with decision making:
• could a client probably consult a monitoring toolkit such as Prometheus or its collected metrics somewhere via a Prometheus writer (to a time-series db..ect.)? the downside this happens at request time and could cause a slight delay • Prometheus writer to write instrumented metrics to a stream broker (such as Kafka), leveraging cloud stream a client to act as a listener/sink and proactively maintains services’ instances metrics (kafka KStream/Ktable) that could be consulted at the time of target services invocations.
I can see both options involve extra components :)
As the paper demonstrates, these approaches allow getting good results when it comes to load distribution (and resulting max loads vs numbers of requests and servers). However, they are communication-heavy, so the resulting communication costs have to be low enough to allow this approach to be profitable.
It could be possible to piggy back on top of underlying service discovering systems, so that when a client requests registration information it also includes data about the services load
• could a client probably consult a monitoring toolkit such as Prometheus or its collected metrics somewhere via a Prometheus writer (to a time-series db..ect.)? the downside this happens at request time and could cause a slight delay • Prometheus writer to write instrumented metrics to a stream broker (such as Kafka), leveraging cloud stream a client to act as a listener/sink and proactively maintains services’ instances metrics (Kafka KStream/Ktable) that could be consulted at the time of target services invocations.
I like the second idea more cause would probably be more performant; although I think it's worth considering if we could get the information from the just with Micrometer, without Prometheus so that we require as little obligatory external components as possible; wdyt?
Micrometer can only tell you about metrics for the local process, right? I also think prometheus is a good idea, but overkill for the simplest use cases. I definitely wouldn’t want it to be a mandatory dependency. We’ll probably end up rewriting Eureka if we take this idea much further.
Micrometer can only tell you about metrics for the local process, right? I also think prometheus is a good idea, but overkill for the simplest use cases. I definitely wouldn’t want it to be a mandatory dependency. We’ll probably end up rewriting Eureka if we take this idea much further.
That's true, but for this scenario, it could work:
we could select the 2 services from SD and then communicate with them to get their load state and choose the best one for the final request - this approach requires an additional round of communication (as compared to the research), and might be too costly over HTTP, but that would probably be most in keeping with PoTC approach.
That being said, I'm also not convinced that this would be more efficient than using SD.
I've picked up tracking this from the ticket @elandau registered for this. Just maybe a little input from our gRPC P2C impl that might apply here:
Our metrics, internally are Spectator, I think Prometheus sounds good - but the provider, ideally would be plug-gable and ditto for the metrics/status evaluation algorithm - depending on the metrics. one has access to. Also, certainly leveraging pluggable SD would be helpful.
I'm wondering if it makes sense that when a service sends its heart beat data to the service discovery server it also includes its current load data. This data could then be updated to service discovery clients when they send their heart beat delta and get updates about newly registered services. Yes, the cached data might be out of date, but we are already fine if the list of services is out of date as well.
a metrics-aware discovery service is optimal if we are fine with outdated cached instances, as @dsyer mentioned it looks like a new enhanced Eureka.
I think I would be happy with local metrics to start with (we have statistics about requests we sent to each service instance from the current JVM at least). It might help if we design the API in such a way that it doesn't preclude a remote implementation. Shouldn't be too hard though.
we do not communicate with the two choices before choosing - there is some overhead mentioned, pathological cases, and diminishing returns over just metrics
It also makes sense to me to avoid that communication since we are communicating with the SR anyway and we can get information from there; however, in this context, why would we do the 2 random choices approach? It seems that the main benefit discussed in the paper is gaining a good load distribution while reducing communication to only 2 instances (the scenario that the algorithm concentrates on is one where we need to contact each instance to get its load status); if we can communicate only with one external app, i.e. the service registry, and get all that data in 1 communication, I'm not sure why we should not just pick the best instance instead of randomly picking 2 to chose one from them?
I'm wondering if it makes sense that when a service sends its heartbeat data to the service discovery server it also includes its current load data.
That's a really good idea but it won't work for all SRs unfortunately, eg. Cloud Foundry or Kubernetes.
I'm also not sure what implications that would have for the heartbeats. IIRC the default setting for most SD clients is to send a heartbeat every 30 seconds. If the system is under high load that information would be out of date and invalid sub-second.
Olga - if the SR is always up-to-date and propagation delay is acceptable then you could be right. Under real world situations with SR lag, outtages, bursty traffic, etc for us it was important for each instance to adapt for periods in isolation so in that sense we may have a variation of the paper as strictly interpreted. SR can suffer from production and replication issues, depending on its design, and refresh characteristics, caching, etc.
My comments pertain to a gRPC implementation (not straight http/servlets) so we have access to some additional information provided by the gRPC API that may or may not apply here. Our filtering flow looks approximately like this:
In the above, our least loaded logic is based on the gRPC subchannel active count (basically the backlog of queued requests). So this (step 5, above) is a local loading decision that can adapt and not suffer from issues with a distributed SR caching / lag.
For the quarantine decision in step 4:
The above is just food for thought. Our gRPC choice of 2 may have started as a strict reading of the research but has evolved over time to account for a number of failure situations we have encountered in under high traffic volume. But its one of many possible, valid implementations that could more strictly apply the choice of 2 research as written or extend beyond the a controlled research environment.
Provide a Power of Two Choices (https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/) implementation for Spring Cloud LoadBalancer.
Original issue: https://github.com/spring-cloud/spring-cloud-commons/issues/595