grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.37k stars 3.82k forks source link

How does client get notified about GOAWAY message? #9903

Closed slogic closed 1 year ago

slogic commented 1 year ago

I have fully independent bi-directional streaming. Server side enabled maxConnectionAge to force client to rebalance connections. As far as I understand underlying layer (I'm using 1.47.1) does not handle transparent reconnection on GOAWAY incoming frame (unlike Go implementations) and ClientCallStreamObserver becomes unusable. So I'm searching for some goaway event handler to avoid onError() call on client side with "UNAVAILABLE: Connection closed after GOAWAY" message when grace period is expired. Same question concerning server side. There is a queue with response messages. When maxConnectionAge is triggered I do not want to send messages to this client anymore.

YifeiZhuang commented 1 year ago

There is no such goaway reconnection handler in java. I mark this as a feature request.

The server side you might use ServerStreamObserver.onCloseHanlder

slogic commented 1 year ago

Thanx for response. I'll check it for server side but I had to assign setOnCancelHandler() to avoid error "CANCELLED: call already cancelled. Use ServerCallStreamObserver.setOnCancelHandler() to disable this exception" on client disconnect.

ejona86 commented 1 year ago

There's no notification that GOAWAY has started. The expectation is either 1) RPCs end within a "reasonable" amount of time and the server is willing to wait that period of time or 2) the RPC could be very long-lived and you accept it will get killed.

Neither the client nor server RPC handlers are notified a GOAWAY is in-progress, but they are notified when the connection dies. A notification sounds fine, until there's a proxy in the mix, at which things become pretty broken. I mention that a bit in https://github.com/grpc/grpc-java/issues/8770#issuecomment-999840527 , and note that in that case I assumed the server wasn't shutting down. If the server is shutting down the client won't know about the GOAWAY if there's a proxy. So the issue is "that's not how the technology works" and I'm not confident we can do that much better than what we have.

ejona86 commented 1 year ago

As far as I understand underlying layer (I'm using 1.47.1) does not handle transparent reconnection on GOAWAY incoming frame (unlike Go implementations) and ClientCallStreamObserver becomes unusable.

On GOAWAY, ClientCallStreamObserver is still usable. After the grace period, though, the RPC will be killed. Go should behave the same way here. Both Java and Go will create a new connection to the server the next RPC. But neither notifies the RPC when GOAWAY is received, because that's not an error.

slogic commented 1 year ago

My assumption that Go implementation controls GOAWAY was made on reading https://lukexng.medium.com/grpc-keepalive-maxconnectionage-maxconnectionagegrace-6352909c57b8 article. Though there is no bi-directional streaming, client has no code to handle cancel event.

The expectation is either 1) RPCs end within a "reasonable" amount of time and the server is willing to wait that period of time or 2) the RPC could be very long-lived and you accept it will get killed.

So, the final competent design (to overcome norebalancing on (new) server pods issues) should be implementation of reconnect on client side with interval conformed with server maxConnectionAge?

ejona86 commented 1 year ago

That medium post uses a Grace Time >= RPC duration (30s for both). That is the "expectation (1)" approach. Although I'll note grace time would have been better 31+ seconds because MaxAge doesn't increase the guaranteed time an RPC is allowed to take. From the post:

The call will take 30s and with MaxAge (10s) + Grace (30s), the call should have enough time to process.

That is only true for the first RPC. If you do one RPC that causes a connection to be created, wait 7 seconds and do another RPC, then that second RPC will only have 3 + 30 seconds time. And if you send an RPC right at 10 seconds, then it either has 30 s or 10+30s to proceed.

should be implementation of reconnect on client side with interval conformed with server maxConnectionAge?

I would close the RPC after a period on server-side and have the client re-create the RPC (with backoff if an error occurred). That way the server is in control of both the RPC lifetime and the max age grace time.

You might find https://kccna18.sched.com/event/GrWo/using-grpc-for-long-lived-and-streaming-rpcs-eric-anderson-google useful, especially slides 7-8. It links to the youtube recording. The slides are available to follow along, but aren't really a replacement for talk video.

ejona86 commented 1 year ago

Seems like this is resolved. If not, comment, and it can be reopened.

slogic commented 1 year ago

I would close the RPC after a period on server-side and have the client re-create the RPC (with backoff if an error occurred). That way the server is in control of both the RPC lifetime and the max age grace time.

If i need to manually close RPC from server side (in my case i need to be very careful to not break current data transfer from client side) then why maxConnectionAge does exist?!

ejona86 commented 1 year ago

maxConnectionAge causes new RPCs to use a new connection. You can use the maxConnectionAgeGrace to limit the RPC lifetime after the max age is reached. See gRFC A9 for reasoning.

slogic commented 1 year ago

what is your meaning of closing RPC? smth put into exchange contract? this is boilerplate code from my point of view because i need exactly to force client to create new connection. i tried to implement this by maxConnectionAge, but client has no idea its connection is gonna be dropped soon (on getting GOAWAY frame). this is quite straitforward, no?

ejona86 commented 1 year ago

"Close the RPC" would mean the server calls onError() or onCompleted(). That ends the RPC and then the client can re-create it.

but client has no idea its connection is gonna be dropped soon (on getting GOAWAY frame).

GOAWAY means "stop creating new RPCs on the connection." It does not mean the connection is going to be dropped soon. The connection can live for a month after the GOAWAY, as long as an RPC created before the GOAWAY is still running and the server's grace time isn't exceeded.

slogic commented 1 year ago

This is not reliable (and somewhat rude) to close connection by server (when it works in normal operational mode) because client may still transferring data.

GOAWAY means "stop creating new RPCs on the connection."

Ok, nice, this is what i need, just give me a chance to catch it on client side and my design in mind would be fulfilled. The whole talk is about it.

It does not mean the connection is going to be dropped soon.

But i'm exactly going to drop it (maxConnectionAge is set on server side) and client got notified about this intention via GOAWAY.

ejona86 commented 1 year ago

This is not reliable (and somewhat rude) to close connection by server (when it works in normal operational mode) because client may still transferring data.

You have not yet explained your use-case, like why you have a long-lived RPC and why the connection and it are related. If it doesn't work for the server to close the stream occasionally then you are left with 1) have the client close the stream occasionally or 2) live with the stream and over-provision.

It does not mean the connection is going to be dropped soon.

But i'm exactly going to drop it (maxConnectionAge is set on server side) and client got notified about this intention via GOAWAY.

maxConnectionAge does not mean the connection will get killed. maxConnectionAgeGrace is required for that. In both cases there is a GOAWAY. The client cannot distinguish between these cases.

slogic commented 1 year ago

You have not yet explained your use-case, like why you have a long-lived RPC

Should i? Well, let it be very large chunck of data. Or it can be many intensive RPCs (we're talking about streaming contact).

maxConnectionAge does not mean the connection will get killed.

By the way i suggest to update documentation (https://grpc.github.io/grpc-java/javadoc/io/grpc/ServerBuilder.html#maxConnectionAge-long-java.util.concurrent.TimeUnit-) with default values and explanation that maxConnectionAge won't work without setting maxConnectionAgeGrace, but GOAWAY frames still gonna be sent, right?

In both cases there is a GOAWAY. The client cannot distinguish between these cases.

I don't care. I'm execting client to react on the first GOAWAY (another incoming GOAWAY events will be processed in idempotent manner (in my case, other consumers can do whatever they want)).

ejona86 commented 1 year ago

You have not yet explained your use-case, like why you have a long-lived RPC

Should i? Well, let it be very large chunck of data. Or it can be many intensive RPCs (we're talking about streaming contact).

I think you should. If it is large chunks of data, then you need resumption ability. If it is many intensive RPCs, then generally you'd use separate RPCs so they can go to different backends and spread load. I can understand those approaches may not work, but we'd need to understand what you're doing in order to help.

I assume somewhere in the mix is an L4/TCP load balancer, since you are so focused on TCP connections.

ejona86 commented 1 year ago

Closing for now. It can be reopened if there's more discussion to be had.