grpc-ecosystem / grpc-spring

Spring Boot starter module for gRPC framework.
https://grpc-ecosystem.github.io/grpc-spring/
Apache License 2.0
3.41k stars 808 forks source link

GRPC server graceful shutdown #1110

Closed o-shevchenko closed 1 month ago

o-shevchenko commented 1 month ago

The context We deploy our service in K8s and provide a gRPC streaming API so the server can hold open connections for a period of time. We need to have a CD to redeploy the new version of the service, but we want to prevent K8s from killing our service if there is an open GRPC stream.

The question Do we have a support for graceful shutdown of the service only when we don't have open connections? I see this: https://github.com/grpc-ecosystem/grpc-spring/blob/master/grpc-server-spring-boot-starter/src/main/java/net/devh/boot/grpc/server/serverfactory/GrpcServerLifecycle.java#L58 But I don't see we check the state of the service itself

ST-DDT commented 1 month ago

Have you tried this config:

https://github.com/grpc-ecosystem/grpc-spring/blob/e52df51897a4f5bebbe0e82d5fefe3d419d86c08/grpc-server-spring-boot-starter/src/main/java/net/devh/boot/grpc/server/config/GrpcServerProperties.java#L111

o-shevchenko commented 1 month ago

Thanks I haven't tried it yet. I will test it with K8s and let you know the result

o-shevchenko commented 1 month ago

Looks like it works. At least now I see that K8s can't kill it for configured period of time. Additionaly to shutdownGracePeriod =-1 I configured terminationGracePeriodSeconds for 24h (just for testing). I also tried to adjust various confs:

grpc:
  server:
    port: 6565
    reflection-service-enabled: true
    shutdown-grace-period: -1
    enable-keep-alive: true
    keep-alive-time: 86400
    keep-alive-timeout: 86400
    permit-keep-alive-without-calls: true
    permit-keep-alive-time: 86400

But after 5 minutes the app is getting killed anyway. I can't find a conf that is responsible for that.

[SpringApplicationShutdownHook] [trace_id=, span_id=]n.d.b.g.s.s.GrpcServerLifecycle          : Completed gRPC server shutdown

Looks like it's Spring conf. I will try to experiment with it more

ST-DDT commented 1 month ago

You could add a log line/debug break point here:

https://github.com/grpc-ecosystem/grpc-spring/blob/e52df51897a4f5bebbe0e82d5fefe3d419d86c08/grpc-server-spring-boot-starter/src/main/java/net/devh/boot/grpc/server/serverfactory/GrpcServerLifecycle.java#L154

To check if the waiting gets interrupted somehow.

o-shevchenko commented 1 month ago

Thanks, I'm already looking into such a logic. It's not easy to debug everything with K8s. I will try to add more logs by DEBUG or use Telepresence or something to understand why the service is getting killed after 5 minutes.

ST-DDT commented 1 month ago

Depending on your setup debugging in K8s is easy. Just expose an additional port or tunnel/port-forward(?) into the container and then connect as usual.

o-shevchenko commented 1 month ago

The connection is closed from the K8 side. When I run the server without K8s and send kill -TERM to the Java process, it waits to close all connections properly. For k8s, the connection is closed, service shut down, and k8s kill container. Need to check ingress timeouts. Or maybe I need to adjust keep-alive confs as well

ST-DDT commented 1 month ago

Thanks for the update

o-shevchenko commented 1 month ago

When running a Java process inside a Docker container, sending a SIGTERM signal (kill -TERM 1) results in immediate termination rather than a graceful shutdown. This issue does not occur when running the same Java process locally and do the same kill.

kubectl exec -it pod_id -- /bin/bash
kill -TERM 1

Locally, it works fine but when server is inside Docker container graceful shutdown doesn't work and I can't understand why localServer.awaitTermination(); immediately kills the server. I don't see any InterruptedException when I connect via 5005.

o-shevchenko commented 1 month ago

I'm running out of ideas. Do you have any ideas on further investigation or narrowing down the scope? Thanks!

ST-DDT commented 1 month ago

Sorry, unfortunately not.

o-shevchenko commented 1 month ago

I think localServer.awaitTermination(); doesn't work as I expect :( . I'm working on implementing a custom ShutDown Hook to check the number of active streams before terminating. This article describes similar problem https://fedor.medium.com/shutting-down-grpc-services-gracefully-961a95b08f8 Just FYI. Thanks for the help

ST-DDT commented 1 month ago

Maybe also create an issue upstream in grpc-java and link it here. Maybe they can add a build in variant as well because I cannot imagine that you are the only one having this problem.

o-shevchenko commented 1 month ago

Yes, I expected it should already be handled downstream. Creating an issue on grpc-java is a good thing.

o-shevchenko commented 1 month ago

I've created an issue: https://github.com/grpc/grpc-java/issues/11229