solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.09k stars 441 forks source link

ExtAuth pods accept traffic while they are dying #5376

Closed jmunozro closed 2 years ago

jmunozro commented 3 years ago

Describe the bug After doing kubectl rollout restart deploy extauth -n gloo-system we start receiving 403. In the logs we can see there is a request that got in after the shutdown is started. Attached screen recording.

To Reproduce Steps to reproduce the behavior:

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: extauth-grpcservice
spec:
  selector:
    matchLabels:
      app: grpc-extauth
  replicas: 1
  template:
    metadata:
      labels:
        app: grpc-extauth
    spec:
      containers:
        - name: grpc-extauth
          image: gcr.io/solo-test-236622/jesus-passthrough-grpc-service:0.1.3
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 9001
---
apiVersion: v1
kind: Service
metadata:
  name: example-grpc-auth-service
  labels:
      app: grpc-extauth
spec:
  ports:
  - port: 9001
    protocol: TCP
  selector:
      app: grpc-extauth
---
apiVersion: enterprise.gloo.solo.io/v1
kind: AuthConfig
metadata:
  name: passthrough-auth
spec:
  configs:
  - passThroughAuth:
      grpc:
        # Address of the grpc auth server to query
        address: example-grpc-auth-service.default.svc.cluster.local:9001
        # Set a connection timeout to external service, default is 5 seconds
        connectionTimeout: 3s
---
apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: customers
spec:
  virtualHost:
    domains:
    - '*'
    options:
      extauth:
        configRef:
          name: passthrough-auth
          namespace: default
      headerManipulation:
        requestHeadersToAdd:
        - append: false
          header:
            key: env
            value: staging
    routes:
    - matchers:
      - prefix: /
      routeAction:
        single:
          upstream:
            name: default-httpbin-8000
            namespace: gloo-system
EOF

Expected behavior I expect a way to have 0 downtime with a rollout restart of extauth.

Additional context Add any other context about the problem here, e.g.

I've found a way to make it better, but it requires sidecar mode and an additional patch, so not ideal for customers

cat <<EOF > gloo-values.yaml
gloo:
  gatewayProxies:
    gatewayProxy:
      kubeResourceOverride:
        spec:
          template:
            spec:
              topologySpreadConstraints:
                - maxSkew: 1 # Try to be spreaded across different zones
                  topologyKey: topology.kubernetes.io/zone
                  whenUnsatisfiable: ScheduleAnyway
                  labelSelector:
                    matchLabels:
                      gateway-proxy-id: gateway-proxy # Mind this is a default value, customized installations may need different matchers
                      gloo: gateway-proxy
      podTemplate:
        probes: true # not needed for shutdown
        gracefulShutdown:
          enabled: true # force the healthcheck/fail, then wait 5s, and then start shutdown
          sleepTimeSeconds: 5 # wait 5s to start shutdown, and extauth will wait 6s
      antiAffinity: true # not in the same host
      kind:
        deployment:
          replicas: 3 # replicas in different zones, if not possible at least not in the same node
global:
  extensions:
    extAuth:
      envoySidecar: true
      standaloneDeployment: false # notice you can have both true, and choose mode in settings. Easier if we set this false

EOF
helm upgrade gloo glooe/gloo-ee -n gloo-system --values gloo-values.yaml --set="license_key=${LICENSE_KEY}"

# This is not possible with Helm, so we patch the ext-auth container to wait for 5s before starting the shutdown process. After that time, we can assume the gw container to be already dead
kubectl -n gloo-system patch deployment gateway-proxy --type json -p='[{"op": "add", "path": "/spec/template/spec/containers/1/lifecycle", "value": {"preStop":{"exec":{"command":[ "/bin/sh", "-c", "sleep 6"]}}} }]'

This is the log extract of an extauth pod in debug mode in the video:

{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"server/context.go:35","msg":"terminated signal received, cancelling server context","version":"1.8.13","version":"undefined"}
{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"server/server.go:138","msg":"server context cancelled, waiting for 15 seconds and failing health checks before initiating graceful shutdown","version":"1.8.13","version":"undefined"}
{"level":"error","ts":"2021-09-24T14:26:24.983Z","caller":"runner/xds.go:162","msg":"failed to start xDS client","version":"1.8.13","version":"undefined","error":"rpc error: code = Canceled desc = context canceled","stacktrace":"github.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:162\ngithub.com/solo-io/go-utils/contextutils.(*exponentioalBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/contextutils/backoff.go:62\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:169\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Start.func1\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/server/server.go:119"}
{"level":"info","ts":"2021-09-24T14:26:24.983Z","caller":"runner/xds.go:155","msg":"closed grpc connection","version":"1.8.13","version":"undefined","address":"gloo:9977"}
{"level":"debug","ts":1632493585.2811286,"logger":"ext-auth","msg":"received signal that caller context has been canceled. Sending header map[x-envoy-immediate-health-check-fail:[]] <nil>","version":"undefined"}
{"level":"debug","ts":1632493585.2811806,"logger":"ext-auth.ext-auth-service","msg":"Received auth request","version":"undefined","request":{"attributes":{"source":{"address":{"Address":{"SocketAddress":{"address":"10.132.0.93","PortSpecifier":{"PortValue":9811}}}}},"destination":{"address":{"Address":{"SocketAddress":{"address":"10.4.1.91","PortSpecifier":{"PortValue":8080}}}}},"context_extensions":{"config_id":"default.passthrough-auth","source_name":"gloo-system.gateway-proxy-listener-::-8080-fresha_customers","source_type":"virtual_host"},"request":{"time":1632493585.262539,"http":{"body":"","host":"34.77.238.250","fragment":"","method":"GET","path":"/get","scheme":"http","size":0,"query":"","protocol":"HTTP/1.1","id":"9225329065070051255","headers":{":path":"/get",":scheme":"http",":authority":"34.77.238.250","accept-encoding":"gzip","x-request-id":"e9bd2e8d-6f4b-4dc4-991b-485f956a7afe","x-forwarded-proto":"http",":method":"GET","x-vegeta-seq":"158","user-agent":"Go-http-client/1.1","authorization":"[redacted]"}}}}}}
{"level":"warn","ts":1632493585.2886324,"logger":"ext-auth.ext-auth-service","msg":"failed to get client from manager","version":"undefined","error":"grpc connection is shutting down","errorVerbose":"grpc connection is shutting down\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6242\n\truntime.doInit:/usr/local/go/src/runtime/proc.go:6265\n\tpassthrough.init:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_client_manager.go:18"}
{"level":"error","ts":1632493585.288717,"logger":"ext-auth.ext-auth-service","msg":"Error during authService execution","version":"undefined","authService":"config_0","error":"failed to get client to grpc service: grpc connection is shutting down","errorVerbose":"failed to get client to grpc service\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55\ngrpc connection is shutting down\n\tgrpc.(*Server).handleStream:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\n\tgrpc.(*Server).processUnaryRPC:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\n\tv3._Authorization_Check_Handler:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\n\tgrpc.chainUnaryServerInterceptors.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\n\tzap.UnaryServerInterceptor.func1:/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\n\tgrpc.getChainUnaryHandler.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\n\thealthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1:/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\n\tv3._Authorization_Check_Handler.func1:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\n\tservice.(*authServer).Check:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\n\tchain.(*authServiceChain).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\n\tchain.(*identityAuthorizerImpl).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:334\n\tpassthrough.(*grpcAuthService).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:90\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55","stacktrace":"github.com/solo-io/ext-auth-service/pkg/chain.(*authServiceChain).Authorize\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\ngithub.com/solo-io/ext-auth-service/pkg/service.(*authServer).Check\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler.func1\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\ngithub.com/solo-io/go-utils/healthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}
{"level":"error","ts":1632493585.2954314,"logger":"ext-auth.ext-auth-service","msg":"Error while authorizing request","version":"undefined","request":{"attributes":{"source":{"address":{"Address":{"SocketAddress":{"address":"10.132.0.93","PortSpecifier":{"PortValue":9811}}}}},"destination":{"address":{"Address":{"SocketAddress":{"address":"10.4.1.91","PortSpecifier":{"PortValue":8080}}}}},"context_extensions":{"config_id":"default.passthrough-auth","source_name":"gloo-system.gateway-proxy-listener-::-8080-fresha_customers","source_type":"virtual_host"},"request":{"time":1632493585.262539,"http":{"body":"","host":"34.77.238.250","fragment":"","method":"GET","path":"/get","scheme":"http","size":0,"query":"","protocol":"HTTP/1.1","id":"9225329065070051255","headers":{"authorization":"[redacted]","x-request-id":"e9bd2e8d-6f4b-4dc4-991b-485f956a7afe","x-forwarded-proto":"http",":method":"GET","x-vegeta-seq":"158",":path":"/get",":scheme":"http",":authority":"34.77.238.250","accept-encoding":"gzip","user-agent":"Go-http-client/1.1"},"error":"failed to get client to grpc service: grpc connection is shutting down","errorVerbose":"failed to get client to grpc service\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55\ngrpc connection is shutting down\n\tgrpc.(*Server).handleStream:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\n\tgrpc.(*Server).processUnaryRPC:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\n\tv3._Authorization_Check_Handler:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\n\tgrpc.chainUnaryServerInterceptors.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\n\tzap.UnaryServerInterceptor.func1:/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\n\tgrpc.getChainUnaryHandler.func1:/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\n\thealthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1:/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\n\tv3._Authorization_Check_Handler.func1:/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\n\tservice.(*authServer).Check:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/service/extauth.go:147\n\tchain.(*authServiceChain).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:103\n\tchain.(*identityAuthorizerImpl).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/chain/chain.go:334\n\tpassthrough.(*grpcAuthService).Authorize:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:90\n\tpassthrough.glob..func1:/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.16.12/pkg/config/passthrough/grpc_auth_service.go:55"}}}},"stacktrace":"github.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler.func1\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:668\ngithub.com/solo-io/go-utils/healthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.21.18/healthchecker/grpc.go:69\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1058\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2/logging/zap/server_interceptors.go:31\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1044\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.9.9-0.20210511190911-87d352569d55/envoy/service/auth/v3/external_auth.pb.go:670\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1217\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:1540\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.36.0/server.go:878"}
{"level":"debug","ts":1632493585.29575,"logger":"ext-auth","msg":"finished unary call with code OK","version":"undefined","grpc.start_time":"2021-09-24T14:26:25Z","grpc.request.deadline":"2021-09-24T14:26:25Z","system":"grpc","span.kind":"server","grpc.service":"envoy.service.auth.v3.Authorization","grpc.method":"Check","grpc.code":"OK","grpc.time_ms":32.018001556396484}
{"level":"info","ts":"2021-09-24T14:26:39.984Z","caller":"server/server.go:147","msg":"started graceful shutdown","version":"1.8.13","version":"undefined"}
{"level":"warn","ts":"2021-09-24T14:26:39.985Z","caller":"server/server.go:132","msg":"error serving healthListener: accept tcp [::]:8082: use of closed network connection","version":"1.8.13","version":"undefined"}

https://user-images.githubusercontent.com/35881711/134691523-ef7a9d37-ad0f-4eab-8ec4-6b1944c18837.mov

Engineering Context

ExtAuth server health check: https://github.com/solo-io/ext-auth-service/blob/814ff47178ed4b1910f58c38de3d8a682189e6b5/pkg/server/server.go#L68

Comment outlining intended behavior of server health checks: https://github.com/solo-io/ext-auth-service/blob/814ff47178ed4b1910f58c38de3d8a682189e6b5/pkg/server/server.go#L141

Recent PR in this area that introduced support for Kube Readiness Probes: https://github.com/solo-io/ext-auth-service/pull/187

jmunozro commented 2 years ago

tested and working fine in gloo 1.10.0-beta3

ben-taussig-solo commented 2 years ago

A fix for this issue has been merged to gloo enterprise v1.10.0-beta3 and gloo enterprise v1.9.4. As such, I am closing the issue.