grpc / grpc-dotnet

gRPC for .NET
Apache License 2.0
4.22k stars 776 forks source link

Resolver/Balancer API #521

Closed monomosc closed 3 years ago

monomosc commented 5 years ago

Grpc Implementations generally favor or at least support Client-Side-Loadbalancing. In Go you go

chan, err := grpc.Dial("my-custom-resolver://some-string-that-has-meaning/some-other-string-with-meaning")

which will basically call a previously registered resolver (my-custom-resolver) with the URL. This resolver then has the job to return 0 or more IP Addresses to finally connect to. It might query SRV records from a DNS resolver, or ask something like etcd, zookeeper, consul, etc...

There is also the old API where you pass grpc.WithBalancer(myBalancer) to grpc.Dial. I suspect this fits C# better.

How will this be achieved in grpc-dotnet?

jtattermusch commented 5 years ago

Currently grpc-dotnet doesn't support client-side loadbalancing (or lookaside loadbalancing). This is a feature that might be implemented in the future, but currently there's no ETA. Our ability to implement it depends a lot on what functionality does the underlying layer provide (HttpClient in case of grpc-dotnet), we need to look into that in detail.

monomosc commented 5 years ago

Do you think it would be feasible to write a CallInvoker (or maybe a ChannelBase) which implements round-robin Dispatch? I would experiment with it for my use-case and return with a Demo-PR if you think that could possibly be considered.

JamesNK commented 5 years ago

How does client-side/lookaside load balancing typically work in other implementations?

When does the channel/call invoker query a resolver (like DNS resolver) to get a list of available endpoints to call? Does it happen when the balancer is created, periodically in a background timer, or just prior to calls being made?

My hunch is load balancing will be external to HttpClient. If all it involves is selecting a target address, that logic can be executed and then the selected address is passed to HttpClient.

jtattermusch commented 5 years ago

@apolcyn has more details on how exactly resolution works.

My understanding (might be a slightly inaccurate / oversimplified version of reality) is that the "simple" (e.g. round_robin) loadbalancing works like this:

  1. name resolution only happens once a call is attempted (or a connection is forced via connectivity API)
  2. multiple backends might be resolved from a single DNS address
  3. round_robin LB must be explicitly configured (otherwise pick_first strategy is used)
  4. if there are multiple backends, gRPC round robins over N connections ("subchannels") at the call level (two successive calls go to different subchannels).
  5. re-resolution only happens if there's a connection failure (there's some disagreement about this rule, but that's currently the behavior gRPC C core has).

The scenario with "lookaside" loadbalancing is much more complicated that that and currently we're in the process of implementing support for xDS protocol loadbalancing (same loadbalancing that envoy proxies use). The idea is similar to grpclb loadbalancing, but grpclb protocol doesn't have much support in the open source world (unlike the xDS protocols).

Some more details from the docs: https://github.com/grpc/grpc/blob/master/doc/load-balancing.md https://github.com/grpc/grpc/blob/master/doc/naming.md https://github.com/grpc/proposal/blob/master/A5-grpclb-in-dns.md https://github.com/grpc/proposal/blob/master/A24-lb-policy-config.md

jtattermusch commented 5 years ago

Some useful resources on gRPC loadbalancing basics and examples for the scenarios described above are here: https://github.com/jtattermusch/grpc-loadbalancing-kubernetes-examples (note that none of this is currently implemented in grpc-dotnet).

apolcyn commented 5 years ago

@jtattermusch I would generally agree with your description too.

However, note though that client side load balancing policy implementations pick their addresses with a a bit more smarts than just the way that resolved address lists are iterated over. One thing to note in particular is that generally client side load balancing "policy" implementations rely on being able to monitor "connectivity state" of "subchannels" (each "subchannel" sounding like it would probably be one HttpClient targetting a single IP address). This is the case even for simpler policies like pick_first and round_robin. In very short, policies use this in order to know things like, when an address is ready to be selected for an RPC, when to skip over an address, and when to re-resolve.

Is there a way that a policy can monitor HttpClient "connectivity state"s (as modeled in https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md)? Or in some other way know the "health" of an HttpClient connection?

This might be worth a doc to review with folks from other grpc languages once hashed out more.

monomosc commented 5 years ago

I hacked something together, and as @apolcyn has said, the only building block missing to reach parity with the golang Balancer and Resolver API surface (which exists almost purely in abstractions) is some way to get connection health status of a HttpClient. Of course you could also defer this the GrpcChannel/Balancing CallInvoker, which would then have to track RpcExceptions.

JamesNK commented 5 years ago

Nice. Are you able to share the source code? It doesn't need to be perfect. I'm curious to see what you came up with.

Yes, querying HttpClient is the blocker here to creating something robust. We might be able to get a feature to enable this in .NET Core 5.0. First step is to figure out what a balancer needs and then talk to the HttpClient team about adding it.

@apolcyn Are there any specs around what a gRPC load balancer should do or does behavior differ depending upon the implementation for a language? Do you have a load balancer implementation you would recommend to aspire to?

monomosc commented 5 years ago

I have approximated my Code in a gist. Check out the file called RelevantCode.cs, ignoring Boilderplate.cs. This does not exactly capture what I'm using internally (which I can maybe share after asking too many people), but it matches it in spirit. It's actually abstracted better.

Issues I could identify

apolcyn commented 5 years ago

@JamesNK sorry there isn't currently a formal detailed spec for LB policy behavior, however the existing language stacks have tended to aim for consistency amongst each other.

leyou240 commented 5 years ago

In my opinion, we can use the service registration pattern to support loadbalancing(pick first, roundrobin, weightroundrobin etc...). https://dzone.com/articles/service-discovery-in-a-microservices-architecture https://github.com/grpc/grpc-java/issues/428

wicharypawel commented 4 years ago

Hi, I think I have something interesting to share. The last few days I've been working on a better understanding of load balancing in grpc. I have expanded the capabilities of grpc dotnet with full support for load balancing.

I think it's something of a worth looking at. I prepared two repositories:

The first repository contains fork of grpc-dotnet repository with extension of grpc.net.client capabilities (most of my work was squashed to erase some wrong ideas of mine 😄 )

link: https://github.com/wicharypawel/grpc-dotnet/commits/load-balancing-for-dotnet

Extensions include: (documentation links below)

Work in progress

The second repository is an example of using this implementation and ton's of other examples and snippets

link: https://github.com/wicharypawel/net-core-grpc-load-balance

Samples include:

How to start: I think best way is to pull second repository and initialize submodule (which includes first repository). Continue with Readme, in case of problems let me know here.

Kudos to @jtattermusch for his presentation about grpclb available here: https://github.com/jtattermusch/grpc-loadbalancing-kubernetes-examples

Related documentation:

Hope you find this helpful, I would be glad to help if you find that interesting to pull request.

jtattermusch commented 4 years ago

@wicharypawel thanks for posting the contribution, it looks interesting. I'll try to take a closer look soon.

wicharypawel commented 4 years ago

Based on chat with @jtattermusch, previously described work will be posted as official proposal in https://github.com/grpc/proposal repository.

Recommended scenario: split work into smaller parts and create separate proposals, in areas such as:

Long run goal is to introduce required structure for basic load balancing policies already available in other languages and aim to support xDS in the future.

Load balancing is developed as an open-source fork as described in my comment above. It still has some missing features which holds me from pull requesting it. Everyone, is welcome to give it a try but remember this is an experimental code.

UPDATE 11.05.2020: I had found out areas that must be improved, before even starting working on proposals. Those areas are reactive approach to chaning infrastructure, creating api that is similar to battle field tested java api.

UPDATE 18.05.2020: If anyone wants to get into it, please tell me not to duplicate the work.

UPDATE 21.05.2020: Reactive version of pick_first and round_robin feels great. I have ported all behaviours related to connectivity and api from Java, which gives gRPC client that react to changing infrastructure. I have made tests using K8s and it is great to see as subchannels are removed and creates as deployments are scaled accordingly. Tomorrow I will start to write gRPF proposal.

UPDATE 28.05.2020: Proposal has been finished, it is waiting for the review of @jtattermusch

wicharypawel commented 4 years ago

Hello @JamesNK I would like to ask for an advice as you are mostly author of the code base. Yesterday, I added one more feature to the codebase, which made some tests fail nondeterministic, when run together.

Could you tell me if there is any shared state between the tests that I should be aware? My case influence multiple tests but as an example let's focus on AsyncClientStreamingCallTests.cs

Some info that may be helpful:

Example stack trace

  Message: 
    System.Exception : Exception of type InvalidOperationException expected; got exception of type RpcException.
      ----> Grpc.Core.RpcException : Status(StatusCode=Unimplemented, Detail="Bad gRPC response. HTTP status code: 404")
  Stack Trace: 
    ExceptionAssert.ThrowsAsync[TException](Func`1 action, String[] possibleMessages) line 52
    TaskExtensions.TimeoutAfter[T](Task`1 task, TimeSpan timeout, String filePath, Int32 lineNumber) line 59
    AsyncClientStreamingCallTests.ClientStreamWriter_WriteWithInvalidHttpStatus_ErrorThrown() line 222
    GenericAdapter`1.BlockUntilCompleted()
    NoMessagePumpStrategy.WaitForCompletion(AwaitAdapter awaitable)
    AsyncToSyncAdapter.Await(Func`1 invoke)
    TestMethodCommand.RunTestMethod(TestExecutionContext context)
    TestMethodCommand.Execute(TestExecutionContext context)
    SimpleWorkItem.PerformWork()
    --RpcException
    HttpContentClientStreamWriter`2.WriteAsyncCore(TRequest message) line 175
    ExceptionAssert.ThrowsAsync[TException](Func`1 action, String[] possibleMessages) line 32
wicharypawel commented 4 years ago

Hello, I would like to share the results of my work. Let me share draft version of the proposal.

grpc-dotnet load balancing support proposal (and related proposals).pdf (proposal starts at page 3)

After having initial review I would like to make pull request into official gRFC proposals repo.

Fork with the code is shared here: https://github.com/wicharypawel/grpc-dotnet/tree/load-balancing-v0.7.0 (commit hash: efd5eda4633acf97380146677ab8b985fa9abf9e) Example usage of this new feature is shared here: https://github.com/wicharypawel/net-core-grpc-load-balance/tree/load-balancing-proposal-examples (commit hash: f0628b80c469fb21502d527b8ccc85b005fe9ed2 )

JamesNK commented 4 years ago

Wow. I'll take a look over the next couple of days. It will take a while for me to get up to speed with load balancing and evaluate what other implementations do.

wicharypawel commented 4 years ago

Hey @JamesNK, take your time, as even if I would apply the proposal for formal review I guess it will take gRPC team, a while to get it covered.

If you would like to have a call on hangouts to get high-level intro how it works, it would be my pleasure (contact me by email). I have spent the last 5 months to keep mastering gRPC by reading official docs, proposals, reading gRPC Java implementation and doing gRPC examples. I feel confident to speak both high-level and in-depth implementation details.

wicharypawel commented 4 years ago

For anybody interested in this topic. It is blocked until gRPC team will accept gRFC proposal, which is mentioned above, as it contains an internal API design ported from gRPC for Java implementation.

Important: working implementation already exist as a fork, for more information see previous comments.

JamesNK commented 4 years ago

I'm looking into load balancing more seriously now, and reviewing the content of your PR.

The right pieces are in the PR, but I want to refactor a lot of the implementation. Some Java concepts like SyncContext, and Rx are Java idioms required in Java libraries because of the lack of async/await. I think we can simplify while keeping most or all of the current features.

wicharypawel commented 4 years ago

Hi James, agree, all language idioms related to Java should be replaced by C# equivalents. Let me list the reasons why I initially took such approach:

In other words, feel free to propose chages to proposal.

JamesNK commented 4 years ago

I looked a little more at your PR. As far as I can tell the connectivity status for each subchannel will always be READY because HttpClient lacks low-level APIs for looking at the underlying HTTP/2 connection.

If that is the case, does the pick_first balancer have much use? Won't the channel always attempt to use the first subchannel?

wicharypawel commented 4 years ago

Subchannel state is changing, it may be surprising but I have managed to make mechanism that workarounds missing low-level api support. The way it works was described in proposal.

See here: https://github.com/wicharypawel/proposal/blob/grpc-dotnet-load-balancing/L71-dotnet-load-balancing.md#known-issues

@JamesNK can you tell me if you have tried to run my code, at https://github.com/wicharypawel/grpc-dotnet/tree/load-balancing-v0.7.1? I am open to have a call to get over vital parts of this implementation.

wicharypawel commented 4 years ago

Hi @JamesNK 👋 I'm curious if you tried out this implementation?

JamesNK commented 4 years ago

Yeah I have. Right now I'm focusing on IPC (i.e. gRPC over namedpipes/UDS) for .NET 5, but I'll look at load balancing more once I have that working.

mjsabby commented 4 years ago

After reading about xDS I was searching if grpc-dotnet had this implementation. So I'm glad it's at least got some legs.

However, is there a point to implement anything beyond xDS? Given that's the direction of gRPC?

Maybe interim support? I'm just curious.

houseofcat commented 4 years ago

@JamesNK Know you are busy, but any updates on this?

wicharypawel commented 4 years ago

@mjsabby xDS is the desired solution, but as far as I know it is still being formed for Java and Go. Other options like round robin are there for regular use (but they are mature in my perspective), however gRPC for .NET has not been focused on this. I have implemented the structure and basic policies but I can't PR it as soon as the proposal will be accepted.

@houseofcat If you are looking for a working no-official implementation check my repository:

houseofcat commented 4 years ago

@wicharypawel thanks I will see if this is viable for my needs!

sunliusi commented 3 years ago

any updates on this?

jtattermusch commented 3 years ago

@JamesNK is working on a design in https://github.com/grpc/proposal/pull/240

pcwiese commented 3 years ago

Has there been any movement on a design/implementation to support the xDS protocol for parity with Go, Java, C-core wrapped langs? My team currently uses the grpclb protocol with the C# wrapper over Grpc.Core and we understand the need to move away from it well before May 2022. I'd like to get involved in this effort if it will speed up the process.

JamesNK commented 3 years ago

Load balancing is coming but only DNS load balancing will be in the first version. I don't know when xDS will happen.

However, the public APIs required to implement xDS should be available, and it could be done in a separate package.

pcwiese commented 3 years ago

Load balancing is coming but only DNS load balancing will be in the first version. I don't know when xDS will happen.

However, the public APIs required to implement xDS should be available, and it could be done in a separate package.

So in your mind, xDS support doesn't belong in this repo, but in a separate one. Is there already some kind of contrib or extensions repo for add-ons to this one?

JamesNK commented 3 years ago

It should belong in this repo but an external contributor can start it independently.

JamesNK commented 3 years ago

Load balancing is available now in preview releases.

https://docs.microsoft.com/en-us/aspnet/core/grpc/loadbalancing?view=aspnetcore-5.0