grpc / grpc-java

The Java gRPC implementation. HTTP/2 based RPC
https://grpc.io/docs/languages/java/
Apache License 2.0
11.48k stars 3.85k forks source link

server awaitTermination() doesn't handle graceful shutdown for open streams #11229

Open o-shevchenko opened 6 months ago

o-shevchenko commented 6 months ago

What version of gRPC-Java are you using?

1.63.0

What is your environment?

RHEL Docker image, JDK 17. We use https://github.com/grpc-ecosystem/grpc-spring, which uses awaitTermination to shut down the server gracefully.

What did you expect to see?

gRPC server supports grateful shutdown if we have open streams. We use gRPC streaming to read and write data via our microservice. We expect that we can utilize K8 graceful shutdown to postpone the pod kill process to finish read/write first and close all streams to don't close the connection.

What did you see instead?

Even if we configured graceful shutdown for gRPC server and K8s pod we still see that gRPC server is terminating immediately after SIGTERM even if we invoke awaitTermination().

Steps to reproduce the bug

  1. Run gRPC server in K8s pod
  2. Open gRPC stream and read data
  3. Trigger pod shutdown or just kill Java process. You can use kubectl delete pod or execute kill -TERM PID for Java process inside your pod (it should have PID 1 if you started your Java app as the main process)
  4. The shutdown hook is triggered, and we invoke awaitTermination(), but the gRPC server is terminated immediately even if we still read data via stream.

See issue: https://github.com/grpc-ecosystem/grpc-spring/issues/1110 See a similar problem described here: https://fedor.medium.com/shutting-down-grpc-services-gracefully-961a95b08f8

sergiitk commented 6 months ago

Could you please try to reproduce this with v1.63.1 or v1.64.0? v1.63.0 contained a few bugs that were fixed in v1.64.0 and backported to v1.63.1: https://github.com/grpc/grpc-java/releases/tag/v1.63.1.

o-shevchenko commented 6 months ago

Thanks for the reply @sergiitk ! Yes, I can reproduce it with 1.63.1 version as well

kannanjgithub commented 6 months ago

Adding a shutdown hook that calls shutdown() and await termination() on GRPC server is the correct way to produce a graceful shutdown, as you have already elucidated. We have had discussions in the past on whether to provide this ability in the GRPC server but decided against it since we are a library, not a framework, and we don't control main.

o-shevchenko commented 5 months ago

Thanks, @kannanjgithub. But I'm not sure if you understand the issue from the description. We already invoked await termination(), but it doesn't work as expected. It ignores open streams and just kills the server even if the client still reads data. We are forced to add additional logic to our shutdown hooks to check open streams for the server before invoking awaitTermination(). Could you comment if it's an expected behaviour? Thanks!

o-shevchenko commented 5 months ago

Added more details @kannanjgithub :

  1. Run gRPC server in K8s pod
  2. Open gRPC stream and read data
  3. Trigger pod shutdown or just kill Java process. You can use kubectl delete pod or execute kill -TERM PID for Java process inside your pod (it should have PID 1 if you started your Java app as the main process)
  4. Shutdown hook is triggered, and we invoke awaitTermination(), BUT the gRPC server is terminated immediately even if we still read data via stream.
kannanjgithub commented 5 months ago

We find it surprising that awaitTermination could have stopped working since it works in the examples code. Can you provide a test setup and share the GCP project with us to help debug the issue?

ejona86 commented 1 day ago

I think I may know what's going on here. I think the last RPCs were cancelled (or deadline exceeded). gRPC then enqueued a callback to an executor and terminated because there were no more RPCs. But your application hasn't necessarily finished its processing in those callbacks.

The easiest way to solve this also follows a best-practice of providing a serverBuilder.executor() to gRPC to run callbacks so that you can limit the maximum number of threads. If you pass your own ExecutorService, then after gRPC's awaitTermination() returns true, you can wait for callbacks to complete.

// Just an example executor. gRPC uses Executor.newCachedThreadPool()
ExecutorService myExecutor = Executors.newFixedThreadPool(10);
Server server = ServerBuilder.forPort(blah)
  ...
  .executor(myExecutor)
  .build();

// In shutdown hook
server.shutdown();
server.awaitTermination(10, TimeUnit.SECONDS);
server.shutdownNow();
server.awaitTermination(10, TimeUnit.SECONDS);
// Now wait for all callbacks to complete. If you have server.awaitTermination()
// in your main(), you could do this there instead. It just needs to happen on a
// non-daemon thread.
myExecutor.shutdown();
myExecutor.awaitTermination(10, TimeUnit.SECONDS);