spring-cloud / spring-cloud-commons

Common classes used in different Spring Cloud implementations
Apache License 2.0
708 stars 704 forks source link

LoadBalancer: support instance selection based on specific actuator metrics #756

Open kmandalas opened 4 years ago

kmandalas commented 4 years ago

Provide more fine-grained load balancing rules based on actuator metrics like system.cpu.usage, jvm.memory.usage etc. Could be something configurable based on a specific metric and a corresponding threshold value. If this value is exceeded the balancer will prefer instances below this value. Otherwise default behavior will be applied.

Alternatively I assume this can be achieved by implementing a custom HealthIndicator but a more out-of-the-box configuration-only capability could be very useful.

spencergibb commented 4 years ago

601 is along those lines. It's fairly hard to do since you would want those metrics from other instances, how do you report that the load balancer.

kmandalas commented 4 years ago

@spencergibb so the only solution would be to implement a custom HealthIndicator and rely on the existing HealthCheckServiceInstanceListSupplier?

spencergibb commented 4 years ago

that's one option, yes.

kmandalas commented 4 years ago

@spencergibb I think that custom HealthIndicator should be avoided though since it would show a status [DOWN] if some instance is under load exceeding some threshold and this is will not be accurate. I guess implementing a custom ReactorServiceInstanceLoadBalancer could be the only alternative at the moment.

kmandalas commented 4 years ago

@spencergibb & @OlgaMaciaszek I am working on a custom ReactorServiceInstanceLoadBalancer at the moment cause I want to achieve a "least_conn" behavior (similarly to NGNIX or HA_PROXY). This is useful in many cases one of which is when you need to load balance WebSocket connections from the Spring Cloud Gateway to multiple instances of WebSocket servers.

More specifically I have started creating a LeastConnLoadBalancer (alternatively a LeastConnServiceInstanceListSupplier) that will ping service instances at a configurable actuator endpoint to get number of users/connections and will choose the one with the least number. Caching with a TTL is also considered in order to avoid hitting every time the endpoint.

I will be watching the parent issue #601 but if you thing a PR would make sense when I complete it, please let me know.

OlgaMaciaszek commented 4 years ago

@kmandalas Definitely, if you come up with something, do submit a PR. You might also want to keep an eye on what is happening with the following issues: https://github.com/spring-cloud/spring-cloud-commons/issues/675 (currently a PR in review - introduces possibilities to propagate load-balanced call data and to run a callback method after a load-balanced call has been completed; probably best to base your changes on that) and https://github.com/spring-cloud/spring-cloud-commons/issues/674 (planning to work on adding in micrometer here).