grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC
https://grpc.io
Apache License 2.0
21.07k stars 4.38k forks source link

balancer APIs (even v2) are not composable #2909

Open jhump opened 5 years ago

jhump commented 5 years ago

The balancer APIs are not at all amenable to composition. For example, if I want to add affinity support to an existing balancer (where a context key indicates an affinity hint and otherwise it routes per the underlying/decorated balancer), I have to wrap almost every type (balancer.Builder, balancer.Balancer, balancer.ClientConn, balancer.Picker, and balancer.SubConn) as well as keep track of a lot of redundant state (basically mirror much of the internal state of the underlying balancer).

A non-trivial change that would go a long way towards fixing this would be to separate the concerns of sub-connection management (instructing the client conn what connections to create and delete based on resolver updates) vs. picking (given a set of ready connections, choose one). The latter half, picking a connection, would be composable, allowing me to create a custom picker that blends or decorates existing strategies. That also allows for the connection management piece to be more easily composed/decorated (for example, adding sophistication to the round-robin strategy other than one sub-conn per resolved address).

A small change that could help for now (though it would not be compatible so would require "V3" interfaces) would be to have Balancer.UpdateSubConnState return a Picker, or maybe (Picker, connectivity.State) instead of requiring it to call ClientConn.UpdateBalancerState. While this doesn't allow blending strategies (since two strategies would "compete" over creation/deletion of sub-connections), it does allow decoration: a decorator could wrap the picker. (Though it must still manage its own set of ready sub-connections.)

Slight aside: it also seems that these APIs aren't amenable to smart retry patterns. Maybe a picker's signature should also receive additional info, like addresses already tried by prior attempts. (This seems particularly important for implementing hedging if/when that ever happens.)

dfawley commented 5 years ago

@jhump thanks for the suggestions. Responses inline:

A non-trivial change that would go a long way towards fixing this would be to separate the concerns of sub-connection management (instructing the client conn what connections to create and delete based on resolver updates) vs. picking (given a set of ready connections, choose one).

Can you provide a concrete proposal for this? (Not formal/finalized or even complete -- I'm just looking for something that shows what you are thinking.) I'm not sure I know what you're talking about, so having specific things to point to would help.

A small change that could help for now (though it would not be compatible so would require "V3" interfaces) would be to have Balancer.UpdateSubConnState return a Picker, or maybe (Picker, connectivity.State) instead of requiring it to call ClientConn.UpdateBalancerState.

My concern with this idea is that updates to the state of a subconn don't necessarily indicate the picker needs to be updated. E.g. if a subconn goes from CONNECTING to TRANSIENT_FAILURE, most will consider that a nop. However, the picker being updated is a meaningful event to grpc, so now we need a way to communicate back to the ClientConn that no picker change should occur.

Slight aside: it also seems that these APIs aren't amenable to smart retry patterns. Maybe a picker's signature should also receive additional info, like addresses already tried by prior attempts. (This seems particularly important for implementing hedging if/when that ever happens.)

I think the API we have is fine for this, it's just that we aren't yet populating what is needed for this. We have the balancer.PickOptions parameter to hold this data once we decide how to include it.

stale[bot] commented 5 years ago

This issue is labeled as requiring an update from the reporter, and no update has been received after 7 days. If no update is provided in the next 7 days, this issue will be automatically closed.

jhump commented 5 years ago

Sorry I never replied. I do appreciate your response, but I ended up hacking around this for my own needs, so it's priority is lower. (I just held my nose and re-implemented round-robin pick strategy as a hard-coded fallback/default strategy for something that supported affinity.)

I think the ideal API would allow different strategies to be picked for connection management (given results from a resolver, what sub-conns need to be created/destroyed and how many) vs picking (given a set of active sub-conns, select one). Also, the ideal API would be amenable to the decorator pattern -- such that I could easily wrap an existing balancer implementation and augment some aspect of it, such as augmenting the picking logic to conditionally support affinity (e.g. use context value to pick a sub-conn, but able to fall-back to underlying picker logic).

Affinity is just one such possible use case. Another is being able to implement connection management logic that does not need to be strictly tied to a picker algorithm (so being able to easily "mix in" round-robin, select-first, random, etc pick strategy into something that provides a layer-4-HWLB-specific connection strategy).

Sorry if that is too vague.

jhump commented 5 years ago

@dfawley, I've been writing some new balancers to solve some issues we've seen with the existing round-robin strategy. We've also been looking into custom connection management strategies since we have some services with 100s of backends, and having every single client establish 100s of sub-conns is a little silly (and will only get worse as the service scales up over time and needs more replicas).

So this issue has reared its head for me again. This time, I took a little more time to examine the existing balancer/base and found some fairly minor changes that can be made there so that it provides the separation of concerns and composability I was looking for.

Let me know what you think: #3108.

zasweq commented 1 year ago

Discussed and had back and forth in #3108. Opened a new issue with decision and tracked by https://github.com/grpc/grpc-go/issues/3425.