uber-archive / hyperbahn

Service discovery and routing for large scale microservice operations
MIT License
396 stars 57 forks source link

round robin peer selection #309

Open Raynos opened 7 years ago

Raynos commented 7 years ago

From a flame graph I've observed that some workers / services are really struggling with peer selection

image

If we implement a random peer selection strategy and add a flipr where we can change the peer selection strategy per serviceName

We already have boolean logic to enabled / disable peer heap per serviceName.

A round robin peer selection will reduce CPU utilization and slightly degrade load balancing by increasing variance.

If round robin is involved we can also just implement random peer selection.

Raynos commented 7 years ago

Figuring out which serviceNames to enable with this "degraded load balancing" strategy is going to be involved.

One strategy is to:

In theory, there should only be ~10ish service names that have both: "high QPS" and "high number of peers" which causes choosePeer() to dominate the flamegraph.

rf commented 7 years ago

we could also add a timing stat that's cluster-wide and only tagged by service name