Describe the bug
After doing kubectl rollout restart deploy extauth -n gloo-system we start receiving 403. In the logs we can see there is a request that got in after the shutdown is started.
Attached screen recording.
Expected behavior
I expect a way to have 0 downtime with a rollout restart of extauth.
Additional context
Add any other context about the problem here, e.g.
Gloo Edge version = gloo-ee-1.8.13
Kubernetes version: 1.20.9
I've found a way to make it better, but it requires sidecar mode and an additional patch, so not ideal for customers
cat <<EOF > gloo-values.yaml
gloo:
gatewayProxies:
gatewayProxy:
kubeResourceOverride:
spec:
template:
spec:
topologySpreadConstraints:
- maxSkew: 1 # Try to be spreaded across different zones
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
gateway-proxy-id: gateway-proxy # Mind this is a default value, customized installations may need different matchers
gloo: gateway-proxy
podTemplate:
probes: true # not needed for shutdown
gracefulShutdown:
enabled: true # force the healthcheck/fail, then wait 5s, and then start shutdown
sleepTimeSeconds: 5 # wait 5s to start shutdown, and extauth will wait 6s
antiAffinity: true # not in the same host
kind:
deployment:
replicas: 3 # replicas in different zones, if not possible at least not in the same node
global:
extensions:
extAuth:
envoySidecar: true
standaloneDeployment: false # notice you can have both true, and choose mode in settings. Easier if we set this false
EOF
helm upgrade gloo glooe/gloo-ee -n gloo-system --values gloo-values.yaml --set="license_key=${LICENSE_KEY}"
# This is not possible with Helm, so we patch the ext-auth container to wait for 5s before starting the shutdown process. After that time, we can assume the gw container to be already dead
kubectl -n gloo-system patch deployment gateway-proxy --type json -p='[{"op": "add", "path": "/spec/template/spec/containers/1/lifecycle", "value": {"preStop":{"exec":{"command":[ "/bin/sh", "-c", "sleep 6"]}}} }]'
This is the log extract of an extauth pod in debug mode in the video:
{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"server/context.go:35","msg":"terminated signal received, cancelling server context","version":"1.8.13","version":"undefined"}
{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"server/server.go:138","msg":"server context cancelled, waiting for 15 seconds and failing health checks before initiating graceful shutdown","version":"1.8.13","version":"undefined"}
{"level":"error","ts":"2021-09-24T14:26:24.983Z","caller":"runner/xds.go:162","msg":"failed to start xDS client","version":"1.8.13","version":"undefined","error":"rpc error: code = Canceled desc = context canceled","stacktrace":"github.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:162\ngithub.com/solo-io/go-utils/contextutils.(*exponentioalBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/contextutils/backoff.go:62\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:169\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Start.func1\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/server/server.go:119"}
{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"runner/xds.go:155","msg":"closed grpc connection","version":"1.8.13","version":"undefined","address":"gloo:9977"}
{"level":"debug","ts":1632493585.2811286,"logger":"ext-auth","msg":"received signal that caller context has been canceled. Sending header map[x-envoy-immediate-health-check-fail:[]] <nil>","version":"undefined"}
{"level":"debug","ts":1632493585.2811806,"logger":"ext-auth.ext-auth-service","msg":"Received auth request","version":"undefined","request":{"attributes":{"source":{"address":{"Address":{"SocketAddress":{"address":"10.132.0.93","PortSpecifier":{"PortValue":9811}}}}},"destination":{"address":{"Address":{"SocketAddress":{"address":"10.4.1.91","PortSpecifier":{"PortValue":8080}}}}},"context_extensions":{"config_id":"default.passthrough-auth","source_name":"gloo-system.gateway-proxy-listener-::-8080-fresha_customers","source_type":"virtual_host"},"request":{"time":1632493585.262539,"http":{"body":"","host":"34.77.238.250","fragment":"","method":"GET","path":"/get","scheme":"http","size":0,"query":"","protocol":"HTTP/1.1","id":"9225329065070051255","headers":{":path":"/get",":scheme":"http",":authority":"34.77.238.250","accept-encoding":"gzip","x-request-id":"e9bd2e8d-6f4b-4dc4-991b-485f956a7afe","x-forwarded-proto":"http",":method":"GET","x-vegeta-seq":"158","user-agent":"Go-http-client/1.1","authorization":"[redacted]"}}}}}}
{"level":"warn","ts":1632493585.2886324,"logger":"ext-auth.ext-auth-service","msg":"failed to get client from manager","version":"undefined","error":"grpc connection is shutting down","errorVerbose":"grpc connection is shutting down\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6265\n\tpassthrough.init:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_client_manager.go:18"}
{"level":"error","ts":1632493585.288717,"logger":"ext-auth.ext-auth-service","msg":"Error during authService execution","version":"undefined","authService":"config_0","error":"failed to get client to grpc service: grpc connection is shutting down","errorVerbose":"failed to get client to grpc service\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55\ngrpc connection is shutting down\n\tgrpc.(*Server).handleStream:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\n\tgrpc.(*Server).processUnaryRPC:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\n\tv3._Authorization_Check_Handler:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\n\tgrpc.chainUnaryServerInterceptors.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\n\tzap.UnaryServerInterceptor.func1:/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\n\tgrpc.getChainUnaryHandler.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\n\thealthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1:/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\n\tv3._Authorization_Check_Handler.func1:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\n\tservice.(*authServer).Check:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\n\tchain.(*authServiceChain).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\n\tchain.(*identityAuthorizerImpl).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:334\n\tpassthrough.(*grpcAuthService).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:90\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55","stacktrace":"github.com/solo-io/ext-auth-service/pkg/chain.(*authServiceChain).Authorize\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\ngithub.com/solo-io/ext-auth-service/pkg/service.(*authServer).Check\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler.func1\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\ngithub.com/solo-io/go-utils/healthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}
{"level":"error","ts":1632493585.2954314,"logger":"ext-auth.ext-auth-service","msg":"Error while authorizing request","version":"undefined","request":{"attributes":{"source":{"address":{"Address":{"SocketAddress":{"address":"10.132.0.93","PortSpecifier":{"PortValue":9811}}}}},"destination":{"address":{"Address":{"SocketAddress":{"address":"10.4.1.91","PortSpecifier":{"PortValue":8080}}}}},"context_extensions":{"config_id":"default.passthrough-auth","source_name":"gloo-system.gateway-proxy-listener-::-8080-fresha_customers","source_type":"virtual_host"},"request":{"time":1632493585.262539,"http":{"body":"","host":"34.77.238.250","fragment":"","method":"GET","path":"/get","scheme":"http","size":0,"query":"","protocol":"HTTP/1.1","id":"9225329065070051255","headers":{"authorization":"[redacted]","x-request-id":"e9bd2e8d-6f4b-4dc4-991b-485f956a7afe","x-forwarded-proto":"http",":method":"GET","x-vegeta-seq":"158",":path":"/get",":scheme":"http",":authority":"34.77.238.250","accept-encoding":"gzip","user-agent":"Go-http-client/1.1"},"error":"failed to get client to grpc service: grpc connection is shutting down","errorVerbose":"failed to get client to grpc service\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55\ngrpc connection is shutting down\n\tgrpc.(*Server).handleStream:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\n\tgrpc.(*Server).processUnaryRPC:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\n\tv3._Authorization_Check_Handler:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\n\tgrpc.chainUnaryServerInterceptors.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\n\tzap.UnaryServerInterceptor.func1:/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\n\tgrpc.getChainUnaryHandler.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\n\thealthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1:/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\n\tv3._Authorization_Check_Handler.func1:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\n\tservice.(*authServer).Check:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\n\tchain.(*authServiceChain).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\n\tchain.(*identityAuthorizerImpl).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:334\n\tpassthrough.(*grpcAuthService).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:90\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55"}}}},"stacktrace":"github.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler.func1\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\ngithub.com/solo-io/go-utils/healthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}
{"level":"debug","ts":1632493585.29575,"logger":"ext-auth","msg":"finished unary call with code OK","version":"undefined","grpc.start_time":"2021-09-24T14:26:25Z","grpc.request.deadline":"2021-09-24T14:26:25Z","system":"grpc","span.kind":"server","grpc.service":"envoy.service.auth.v3.Authorization","grpc.method":"Check","grpc.code":"OK","grpc.time_ms":32.018001556396484}
{"level":"info","ts":"2021-09-24T14:26:39.984Z","caller":"server/server.go:147","msg":"started graceful shutdown","version":"1.8.13","version":"undefined"}
{"level":"warn","ts":"2021-09-24T14:26:39.985Z","caller":"server/server.go:132","msg":"error serving healthListener: accept tcp [::]:8082: use of closed network connection","version":"1.8.13","version":"undefined"}
Describe the bug After doing
kubectl rollout restart deploy extauth -n gloo-system
we start receiving 403. In the logs we can see there is a request that got in after the shutdown is started. Attached screen recording.To Reproduce Steps to reproduce the behavior:
Expected behavior I expect a way to have 0 downtime with a rollout restart of extauth.
Additional context Add any other context about the problem here, e.g.
I've found a way to make it better, but it requires sidecar mode and an additional patch, so not ideal for customers
This is the log extract of an extauth pod in debug mode in the video:
https://user-images.githubusercontent.com/35881711/134691523-ef7a9d37-ad0f-4eab-8ec4-6b1944c18837.mov
Engineering Context
ExtAuth server health check: https://github.com/solo-io/ext-auth-service/blob/814ff47178ed4b1910f58c38de3d8a682189e6b5/pkg/server/server.go#L68
Comment outlining intended behavior of server health checks: https://github.com/solo-io/ext-auth-service/blob/814ff47178ed4b1910f58c38de3d8a682189e6b5/pkg/server/server.go#L141
Recent PR in this area that introduced support for Kube Readiness Probes: https://github.com/solo-io/ext-auth-service/pull/187