zeromicro / go-zero

A cloud-native Go microservices framework with cli tool for productivity.
https://go-zero.dev
MIT License
29.42k stars 3.97k forks source link

Graceful Exit Failure When Using Gateway and RPC Services Simultaneously within ServiceGroup #4261

Open stonever opened 4 months ago

stonever commented 4 months ago

Describe the bug I am encountering an issue where services fail to exit gracefully when both gateway and RPC services are used together within a ServiceGroup. The problem does not occur when the gateway service is removed from the group. To Reproduce Steps to reproduce the behavior, if applicable:

    serviceGroup := service.NewServiceGroup()
    defer serviceGroup.Stop()
    rpcServer := zrpc.MustNewServer(c.RpcServer, func(grpcServer *grpc.Server) {
        pb.RegisterFlowServer(grpcServer, flowserver.NewFlowServer(svcCtx))
        reflection.Register(grpcServer)
    })
    serviceGroup.Add(rpcServer) 

    gw := gateway.MustNewServer(c.Gateway)
    serviceGroup.Add(gw)

    serviceGroup.Start()
    slog.Info("exiting")

Attempt to stop the services gracefully by CTRL+C

  1. The error is

    the system does not print out the "Exiting" message that indicates a successful graceful shutdown process.

Expected behavior print out the "Exiting" message

Environments (please complete the following information):

More description Through debugging, I have identified two key issues that prevent graceful shutdown:

Gateway WaitGroup Blockage: The gateway service appears to block on a waitgroup because not all of its shutdown listeners (registered via proc) have completed their shutdown procedures. This results in the gateway waiting indefinitely for these listeners to finish, preventing a graceful exit. RPC Service Connection Hang: The RPC service is unable to exit gracefully due to a persistent connection that refuses to close. This connection is associated with the gateway service, suggesting a possible deadlock scenario where each service is waiting on the other to complete its shutdown procedure. This cyclic dependency prevents both services from terminating properly. These insights indicate that there might be a coordination issue between the gateway and RPC services during the shutdown process, possibly due to improper handling of shutdown signals or synchronization primitives like waitgroups.

kevwan commented 4 months ago

I use the following code, didn't reproduce this problem.

func main() {
    flag.Parse()

    var c config.Config
    conf.MustLoad(*configFile, &c)

    group := service.NewServiceGroup()
    gw := gateway.MustNewServer(c.Gateway)
    group.Add(gw)

    ctx := svc.NewServiceContext(c)
    s := zrpc.MustNewServer(c.RpcServerConf, func(grpcServer *grpc.Server) {
        pb.RegisterGreetServer(grpcServer, server.NewGreetServer(ctx))
        reflection.Register(grpcServer)
    })
    group.Add(s)

    fmt.Printf("Starting rpc server at %s...\n", c.ListenOn)
    group.Start()
}

Would you please give me the full code on this issue?

stonever commented 4 months ago

awesomeProject.zip I wrote a minimal demo; pls take a look. @kevwan Log:

^C{"@timestamp":"2024-07-25T15:28:12.737+08:00","caller":"proc/shutdown.go:58","content":"Got signal 2, shutting down...","level":"info"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"service/servicegroup.go:53","content":"Shutting down services in group","level":"info"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(gateway) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"} {"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(flow.rpc) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"} {"@timestamp":"2024-07-25T15:28:42.738+08:00","caller":"proc/shutdown.go:65","content":"Still alive after 30s, going to force kill the process...","level":"info"}

kevwan commented 3 months ago

Thanks for your demo code!

I found that in zrpc, when we use server.GracefulStop(), it blocks. While using server.Stop() works.

I'm digging into it.

stonever commented 3 months ago

@kevwan Any update? Thx

kevwan commented 3 months ago

Still working on it. Get back to you when I have more progress.