zeromicro / go-zero

A cloud-native Go microservices framework with cli tool for productivity.
https://go-zero.dev
MIT License
28.89k stars 3.91k forks source link

Graceful Exit Failure When Using Gateway and RPC Services Simultaneously within ServiceGroup #4261

Open stonever opened 1 month ago

stonever commented 1 month ago

Describe the bug I am encountering an issue where services fail to exit gracefully when both gateway and RPC services are used together within a ServiceGroup. The problem does not occur when the gateway service is removed from the group. To Reproduce Steps to reproduce the behavior, if applicable:

    serviceGroup := service.NewServiceGroup()
    defer serviceGroup.Stop()
    rpcServer := zrpc.MustNewServer(c.RpcServer, func(grpcServer *grpc.Server) {
        pb.RegisterFlowServer(grpcServer, flowserver.NewFlowServer(svcCtx))
        reflection.Register(grpcServer)
    })
    serviceGroup.Add(rpcServer) 

    gw := gateway.MustNewServer(c.Gateway)
    serviceGroup.Add(gw)

    serviceGroup.Start()
    slog.Info("exiting")

Attempt to stop the services gracefully by CTRL+C

  1. The error is

    the system does not print out the "Exiting" message that indicates a successful graceful shutdown process.

Expected behavior print out the "Exiting" message

Environments (please complete the following information):

More description Through debugging, I have identified two key issues that prevent graceful shutdown:

Gateway WaitGroup Blockage: The gateway service appears to block on a waitgroup because not all of its shutdown listeners (registered via proc) have completed their shutdown procedures. This results in the gateway waiting indefinitely for these listeners to finish, preventing a graceful exit. RPC Service Connection Hang: The RPC service is unable to exit gracefully due to a persistent connection that refuses to close. This connection is associated with the gateway service, suggesting a possible deadlock scenario where each service is waiting on the other to complete its shutdown procedure. This cyclic dependency prevents both services from terminating properly. These insights indicate that there might be a coordination issue between the gateway and RPC services during the shutdown process, possibly due to improper handling of shutdown signals or synchronization primitives like waitgroups.

kevwan commented 1 month ago

I use the following code, didn't reproduce this problem.

func main() {
    flag.Parse()

    var c config.Config
    conf.MustLoad(*configFile, &c)

    group := service.NewServiceGroup()
    gw := gateway.MustNewServer(c.Gateway)
    group.Add(gw)

    ctx := svc.NewServiceContext(c)
    s := zrpc.MustNewServer(c.RpcServerConf, func(grpcServer *grpc.Server) {
        pb.RegisterGreetServer(grpcServer, server.NewGreetServer(ctx))
        reflection.Register(grpcServer)
    })
    group.Add(s)

    fmt.Printf("Starting rpc server at %s...\n", c.ListenOn)
    group.Start()
}

Would you please give me the full code on this issue?

stonever commented 1 month ago

awesomeProject.zip I wrote a minimal demo; pls take a look. @kevwan Log:

^C{"@timestamp":"2024-07-25T15:28:12.737+08:00","caller":"proc/shutdown.go:58","content":"Got signal 2, shutting down...","level":"info"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"service/servicegroup.go:53","content":"Shutting down services in group","level":"info"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(gateway) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(flow.rpc) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"} {"@timestamp":"2024-07-25T15:28:42.738+08:00","caller":"proc/shutdown.go:65","content":"Still alive after 30s, going to force kill the process...","level":"info"}

kevwan commented 1 month ago

Thanks for your demo code!

I found that in zrpc, when we use server.GracefulStop(), it blocks. While using server.Stop() works.

I'm digging into it.

stonever commented 3 weeks ago

@kevwan Any update? Thx

kevwan commented 2 weeks ago

Still working on it. Get back to you when I have more progress.