grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.42k stars 3.84k forks source link

How to retry with new auth token using builtin retry? #7345

Closed asarkar closed 4 years ago

asarkar commented 4 years ago

My requirement is similar to https://github.com/grpc/grpc-java/issues/6638 and https://github.com/grpc/grpc-java/issues/5856, but on JVM, not Android. I want to retry a bidi streaming call with a new auth token if the server responds with a particular status. In both of the aforementioned tickets, an interceptor option was discussed, but this comment also mentions using the builtin retry with ClientStreamTracer that was never elaborated on. I'd like to get some details on using the builtin retry with ClientStreamTracer option.

ejona86 commented 4 years ago

CallCredentials should generally determine when the credential is invalid. If they do so, then any retry would naturally use a fresh token, as call credentials is called independently for each retry.

"Use and interceptor" and "use a ClientStreamTracer" are simply ways to see the response status. For the tracer, you'd observe streamClosed(Status). You "attach" a tracer to an RPC via callOptions.withStreamTracerFactory(). We do not provide a convenience method on AbstractStub, so you'd probably use an interceptor to set it.

But generally we'd discourage that approach if at all possible and instead suggest the CallCredentials to avoiding using stale auth tokens to begin with.

asarkar commented 4 years ago

instead suggest the CallCredentials to avoiding using stale auth tokens

For OAuth, may be yes, but in general, there's no way to know that unless we hit the server. I think it's not an uncommon use case where user wants to perform some "side effect" before retrying. Writing an interceptor is easy, but I've not seen any publicly available code using ClientStreamTracer for retry; if you would provide some reference code samples, I could start with that.

ejona86 commented 4 years ago

but in general, there's no way to know that unless we hit the server

Why is that? What causes the token to expire?

I think it's not an uncommon use case where user wants to perform some "side effect" before retrying.

I don't think this is really all that related to retrying. Even if you end up doing a completely-separate RPC it seems you still would need a valid auth token.

asarkar commented 4 years ago

Why is that?

The token is managed by the server, and isn’t a rich object like OAuth token or a cookie that the client can inspect. It’s just a string. Reasons for it to be expired include the server deleted it due to unuse, the server needed to make more space, or because it’s Tuesday and raining. The client is expected to handle token expiration normally, and an API is provided for renewing the token. However, doing so for every call is unnecessary, which is why I’m looking to do it only when it’s needed.

ejona86 commented 4 years ago

I like raining Tuesdays, except if it's due to Hurricane Marcus.

It sounds like the client has some secret it exchanges for an auth token, but the auth token does not have a pre-determined lifetime. So it's similar to oauth as it has a refresh token of some sort, but it isn't oauth.

@Override public <ReqT,RespT> ClientCall<ReqT,Resp> interceptCall(
    MethodDescriptor<ReqT,RespT> method, CallOptions callOptions, Channel next) {
  callOptions = callOptions
      .withCallCredentials(credentials)
      .withStreamTracerFactory(new ClientStreamTracer.Factory() {
        @Override public ClientStreamTracer newClientStreamTracer(
            ClientStreamTracer.StreamInfo info, Metadata headers) {
          // Stream tracers and the factory _must not_ block or run for lengthy periods of time
          final Object authToken = headers.get(AUTH_TOKEN_KEY);
          return new ClientStreamTracer() {
            @Override public streamClosed(Status status) {
              // this _most not_ fetch a new token or wait for a token being fetched; it must invalidate,
              // immediately. Be careful of your locking in the credentials implementation
              credentials.invalidate(authToken);
            }
          };
        }
      });
  return next.newCall(method, callOptions);
}

Tracers are run synchronously on the network thread. Thus it is very important that they execute quickly without doing I/O or blocking on locks for long periods of time. Be very, very careful with any locking and make sure the lock is never held while performing I/O or similar (even if it won't be held doing I/O during invalidate() you can't have invalidate() wait a long time for the lock to be released).

asarkar commented 4 years ago

@ejona86 This is helpful, thank you.

Stream tracers and the factory must not block or run for lengthy periods of time

What is "lengthy", quantitatively, and why is the duration a concern? Depending on where the client is located geographically, sometimes the refresh token may take few seconds (2 to 3); is that lengthy enough to be concerned? If it is, then the renewal has to be handled separately from invalidation.

ejona86 commented 4 years ago

What is "lengthy"

Generally most CPU-based operations are fine. Any I/O is a no-no (we could maybe "look the other way" for local disk I/O, but network I/O would cause problems). Blocking on locks that are held by something doing a CPU-based operation is fine, but not one doing I/O with the lock held. In practice, most operations we do in a tracer are measured in nanoseconds. A "slow" operation may take a microsecond.

The Tracer is actually able to modify the metadata. You could have the tracer set the auth token directly, instead of the round-about way I did it. However, that would then be blocking in a tracer (which is a no-no). So I suggested using CallCredentials instead. CallCredentials is able to easily do expensive work on the provided Executor and notify gRPC when it completes. CallCredentials are called for each retry attempt, so this would work well for your needs.

asarkar commented 4 years ago

Thank you for the excellent explanation. I'm curious, and you didn't say, why is it a problem for the tracer to block? Is it a design/pedagogical concern, or is there a more practical reason?

BTW, I'm working on this impl now, will report back if I come across a problem.

ejona86 commented 4 years ago

The each network thread can perform (nonblocking) I/O for multiple connections. If you block it would prevent all connections associated with that thread from being processed. Note that that could cause keepalives to fail and thus connections to be destroyed and it could also cause handshake timeouts. The threads are used for timers, so blocking can prevent timers from firing, e.g., for deadlines. If you use gRPC to get your auth token it could deadlock. It might even be possible to cause the StreamObserver API to block the calling thread.

The number of things that go wrong can be high, and it can be really hard to track down the source of the issue. It is totally possible to write code that runs on these threads. Just stay mindful. It's more "don't take short-cuts."

asarkar commented 4 years ago

I assume by “connection”, you mean the underlying transport connection.

Unfortunately, it seems like I can’t use the tracer for my case. The server returns an OK status with an error message (it’s bad, but nothing I can do about that), so status alone isn’t enough to keep or invalidate the token. I’m using Spring, so I’m going to use your idea of invalidating the token by having a singleton CallCredentials that is accessed from response onMessage.

sanjaypujare commented 4 years ago

... The server returns an OK status with an error message (it’s bad, but nothing I can do about that), so status alone isn’t enough to keep or invalidate the token.

But Status also includes description. Wouldn't that be the error message you are looking for?

asarkar commented 4 years ago

@sanjaypujare, No, the error is actually a valid message as far as gRPC is concerned. Just like Microservices, there are people jumping on the gRPC bandwagon because they heard it's "better" or "faster", but have no clue or intent to do it right.

sanjaypujare commented 4 years ago

@sanjaypujare, No, the error is actually a valid message as far as gRPC is concerned. Just like Microservices, there are people jumping on the gRPC bandwagon because they heard it's "better" or "faster", but have no clue or intent to do it right.

Oh okay. So the so called error message is part of the response message (RespT) which is why you mentioned accessing from the response onMessage. Got it.

ejona86 commented 4 years ago

If your server responds with OK, then an expired token will never trigger built-in retries. (Because the RPC was successful)

asarkar commented 4 years ago

will never trigger built-in retries

Yes, I realized that after I implemented the interceptor. I throw an exception when I see the "error message", but it's too late. Currently researching other way to deal with this :rage:

sanjaypujare commented 4 years ago

Yes, I realized that after I implemented the interceptor. I throw an exception when I see the "error message", but it's too late. Currently researching other way to deal with this 😡

Instead of throwing an exception how about sending a close with the appropriate status to the responseListener? Something like the following in your interceptor:

              public void onMessage(RespT message) {
                if ("error message".equals(message)) {
                  responseListener.onClose(Status.UNAUTHENTICATED, null);
                }
              }
ejona86 commented 4 years ago

Instead of throwing an exception how about sending a close with the appropriate status to the responseListener?

Yes, we don't really want people throwing exceptions from interceptors. But in either case the call will be complete and you can't trigger a built-in retry.

dapengzhang0 commented 4 years ago

Seems the issue is resolved. We can reopen it if there is a new update still having issues.