grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.46k stars 3.85k forks source link

Official support for Retry policy #8899

Closed politrons closed 2 years ago

politrons commented 2 years ago

Hi all,

I'm struggling to find an official documentation of Retry policy on gRPC calls, in particular, to know if is still experimental, or if is something official and supported to be used.

My current version 1.30, but I would upgrade to the latest version if is required to have this feature.

I found this repo, but it seems is just a proposal https://github.com/grpc/proposal/blob/master/A6-client-retries.md

Can you share a documentation and point from which version we can use it, in case is supported?

Regards

sanjaypujare commented 2 years ago

https://github.com/grpc/grpc-java/issues/3982 talks about retry APIs being experimental and the issue is still open. @dapengzhang0 any thing else you want to add?

ejona86 commented 2 years ago

Retries were stabilized in v1.40.0, where they were enabled by default, which includes service config support. Bringing up #3982 is interesting; it seems those APIs need to become stabilized.

dapengzhang0 commented 2 years ago

ManagedChannelBuilder.enableRetry() and disableRetry() are stabilized in v1.40. But ManagedChannelBuilder.defaultServiceConfig() is still experimental, so technically the stabilized usecase is only the proxyless-service-mesh usecase. Maybe we can further stabilize ManagedChannelBuilder.defaultServiceConfig() as well.

politrons commented 2 years ago

@dapengzhang0 "proxyless-service-mesh usecase" what it does means? Can use then backoff configuration like the one provided in the ManagedChannelBuilder.defaultServiceConfig ?

Regards.

dapengzhang0 commented 2 years ago

"proxyless-service-mesh usecase" what it does means?

Basically that is a client sending requests to a server in a service-mesh deployment with a control plane, such Google Traffic Director, using xDS API. The client obtains a service config (can include with retry config) from the control plane rather than setting defaultServiceConfig locally.

dapengzhang0 commented 2 years ago

I found this repo, but it seems is just a proposal https://github.com/grpc/proposal/blob/master/A6-client-retries.md

@politrons I've updated the status of gRPC-A6 to "Implemented". See https://github.com/grpc/proposal/pull/287

politrons commented 2 years ago

It means then, that the retry policy must be configure in the Istio VirtualService I've Never seen that policy work over gRPC only by HTTP. At least with the Istio version we use in my company.

So there is no other way to configure the retries programatically?

Otherwise I think I will have to keep my own implementation of the retries shutting down connections in case of timeouts.

dapengzhang0 commented 2 years ago

@politrons You can also configure retry with ManagedChannelBuilder.defaultServiceConfig() API. There is an example using defaultServiceConfig() for hedging, and retry should be similar. Only that the defaultServiceConfig() API is currently experimental and we are discussing to stabilize it as well.

ejona86 commented 2 years ago

Otherwise I think I will have to keep my own implementation of the retries shutting down connections in case of timeouts.

Shutting down connections when triggered by timeouts is unrelated to any retry feature. That should never be necessary with gRPC. If you have trouble along these lines, you probably want to set a keepalive time.

politrons commented 2 years ago

Well in our use case, after one second without response, we want to close the connection, and open a new one against another pod in the cluster. I’m afraid the current implementation of https://github.com/grpc/proposal/blob/master/A6-client-retries.md is not working, and there are not retries. I just follow the same code in the server side with defaultServiceConfig and the Json file, but is not doing any retry

ejona86 commented 2 years ago

It seems like what you are wanting does not fit into the design of retries. It is similar to hedging, although 1) you want to stop previous RPCs and 2) you want to kill the connection.

I'd suggest you consider if hedging would suite your needs. But otherwise it seems like very application-specific logic that is poorly suited to a generic solution in gRPC. From what I can tell, it doesn't seem there's evidence to say that retries aren't working as designed.

If you change your requirements, you may be able to get a LB policy doing what you want. Outlier detection might also be useful (but it isn't yet implemented in Java).

Closing since it seems there's not much more we can help with here. If not, comment, and we can reopen.