solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.09k stars 438 forks source link

Faulty RemoteJwks first try will never be retried #7803

Closed Ati59 closed 5 months ago

Ati59 commented 1 year ago

Gloo Edge Version

1.13.x (latest stable)

Kubernetes Version

None

Describe the bug

If the RemoteJwks url is not reachable at the extauth service launch, it will throw an error failed to fetch JWKS but will never retried to get the the jwks even if refreshInterval is set.

Then every call through a route using the AuthConfig will get a 403 error.

On ext-auth logs :

{"level":"error","ts":"2023-02-07T13:30:10.094Z","caller":"jwks/utils.go:24","msg":"failed to fetch JWKS","version":"1.12.40","error":"Get \"http://172.18.2.2:8080/realms/master/protocol/openid-connect/certs\": dia
l tcp 172.18.2.2:8080: connect: connection refused","stacktrace":"github.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwksWithClient\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/config/
utils/jwks/utils.go:24\ngithub.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwks\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/config/utils/jwks/utils.go:16\ngithub.com/solo-io/ext-auth-
service/pkg/config/oauth/token_validation/jwt/jwks.NewRemoteJwksSource\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/config/oauth/token_validation/jwt/jwks/remote.go:71\ngithub.com/solo-io/ext-auth
-service/pkg/config.(*authServiceFactory).NewOAuth2JwtAccessToken\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/config/factory.go:218\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(
*extAuthConfigTranslator).authConfigToService\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:273\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigT
ranslator).getConfigs\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:98\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).Translate\n\t/
go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:82\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/sol
o-projects/projects/extauth/pkg/config/generator.go:86\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runn
er/xds.go:116\ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.12.37/projects/gloo/pkg/api/v1/enterprise/options/ext
auth/v1/ext_auth_discovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.30.7/pkg/api/v1/control-plane/client/clien
t.go:137\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:145\ngithub.com/solo-io/go-utils/conte
xtutils.(*exponentialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.22.1/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/githu
b.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:154\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func2\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/server/server.go:14
9"}
{"level":"error","ts":"2023-02-07T13:30:10.098Z","caller":"config/generator.go:114","msg":"Errors encountered while processing new server configuration","version":"1.12.40","error":"1 error occurred:\n\t* failed to
 get auth service for auth config with id [gloo-system.accesstoken-auth]; this configuration will be ignored: failed to fetch JWKS: Get \"http://172.18.2.2:8080/realms/master/protocol/openid-connect/certs\": dial t
cp 172.18.2.2:8080: connect: connection refused\n\n","stacktrace":"github.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/solo-projects/projects
/extauth/pkg/config/generator.go:114\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:116\ngit
hub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.12.37/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1/ext_auth_d
iscovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.30.7/pkg/api/v1/control-plane/client/client.go:137\ngithub.c
om/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:145\ngithub.com/solo-io/go-utils/contextutils.(*exponent
ialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.22.1/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo
-projects/projects/extauth/pkg/runner/xds.go:154\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func2\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/server/server.go:149"}
{"level":"error","ts":1675776616.4879637,"logger":"ext-auth.ext-auth-service","msg":"Auth Server does not contain auth configuration with the given ID","version":"undefined","x-request-id":"b9568388-3cb7-4e1b-8b0d-
135454e91eb4","requestContext":{"AuthConfigId":"gloo-system.accesstoken-auth","SourceType":"virtual_host","SourceName":"gloo-system.gateway-proxy-listener-::-8080-gloo-system_vs"},"stacktrace":"github.com/envoyprox
y/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler.func1\n\t/go/pkg/mod/github.com/envoyproxy/go-control-plane@v0.10.3/envoy/service/auth/v3/external_auth.pb.go:699\ngithub.com/solo-io/go-utils/h
ealthchecker.GrpcUnaryServerHealthCheckerInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.22.1/healthchecker/grpc.go:69\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.go
lang.org/grpc@v1.49.0/server.go:1135\ngithub.com/solo-io/ext-auth-service/pkg/server.requestIdInterceptor.func1\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.21.6/pkg/server/logging.go:86\ngoogle.golang.org
/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grp
c-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:31\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1138\ngoogle.golang.org/grp
c.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1140\ngithub.com/envoyproxy/go-control-plane/envoy/service/auth/v3._Authorization_Check_Handler\n\t/go/pkg/mod/github.com/envoy
proxy/go-control-plane@v0.10.3/envoy/service/auth/v3/external_auth.pb.go:701\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1301\ngoogle.golang.org/grpc.(*
Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1642\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:938"}

On gateway-proxy logs :

[2023-02-07T13:30:59.213Z] "HEAD /get HTTP/1.1" 403 UAEX 0 0 52 - "-" "curl/7.85.0" "22916f1f-ce5e-4385-ad0d-55fcfea928f3" "httpbin.domain.local" "-"
[2023-02-07T13:31:00.414Z] "HEAD /get HTTP/1.1" 403 UAEX 0 0 11 - "-" "curl/7.85.0" "0e781a03-9441-491f-a4a7-685dd7475c38" "httpbin.domain.local" "-"
[2023-02-07T13:31:02.367Z] "HEAD /get HTTP/1.1" 403 UAEX 0 0 6 - "-" "curl/7.85.0" "c4c7eb49-473b-45a3-b601-7983f04d0d8a" "httpbin.domain.local" "-"

Steps to reproduce the bug

  1. Define an AuthConfig as-is :
    apiVersion: enterprise.gloo.solo.io/v1
    kind: AuthConfig
    metadata:
    name: accesstoken-auth
    namespace: gloo-system
    spec:
    configs:
    - oauth2:
        accessTokenValidation:
          jwt:
            remoteJwks:
              url: ${KEYCLOAK_URL}/realms/master/protocol/openid-connect/certs
              refreshInterval: "10"

And a VirtualService that is using it :

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: vs
  namespace: gloo-system
spec:
  virtualHost:
    domains:
      - '*'
    routes:
      - matchers:
          - prefix: /
        routeAction:
          single:
            upstream:
              name: httpbin-httpbin-8000
              namespace: gloo-system
    options:
      extauth:
        configRef:
          name: accesstoken-auth
          namespace: gloo-system
  1. Make keycloak unavailable (I change the selector to get all endpoint out of the service)
  2. Restart the ext-auth service, it should not get the jwks and you should see failed to fetch JWKS error on ext-auth logs
  3. Wait for refreshInterval
  4. Trying to curl your exposed app should end up on UAEX errors on gateway-proxy pod and Auth Server does not contain auth configuration with the given ID errors on ext-auth one (6. If you make keycloak available again and then restart the ext-auth pod, it will fix the issue)

Expected Behavior

Ext-auth pod should retry to get the jwks based on refreshInterval value so we can get through the authentication process and end up with 200 without having to restart the ext-auth pod. When the first try is faulty it looks like the "refresh loop" is not launch at all.

Additional Context

No response

Related Issues

┆Issue is synchronized with this Asana task by Unito

SantoDE commented 1 year ago

This might be a dupe

Ati59 commented 1 year ago

We have a KCS article on zendesk for this. It is probably the "dupe" ;)

sam-heilbron commented 1 year ago

Similar issue: https://github.com/solo-io/gloo/issues/7528

sam-heilbron commented 1 year ago

Additional context: https://solo-io-corp.slack.com/archives/C02LQ0JCNLF/p1694578006362599 We support an on demand cache refresh policy (https://docs.solo.io/gloo-edge/latest/reference/api/github.com/solo-io/gloo/projects/gloo/api/v1/enterprise/options/extauth/v1/extauth.proto.sk/#jwksondemandcacherefreshpolicy) in OIDC that could be re-used in this service (would require code changes)

DuncanDoyle commented 7 months ago

Can still reproduce this on GE 1.15.14. The problem seems to be that when the ExtAuth server loads with an AuthConfig that points to a JWKS endpoint that is not available/reachable, the config does not get loaded. And when the config does not get loaded, the refreshInterval does not get triggered at all. So the only way to get out of that situation is to reload the AuthConfig or restart the ExtAuth server.

{"level":"info","ts":"2024-03-01T10:22:17.510Z","caller":"runner/xds.go:115","msg":"{\"auth_config_ref_name\":\"gloo-system.oauth-auth\",\"configs\":[{\"AuthConfig\":{\"Oauth2\":{\"OauthType\":{\"AccessTokenValidationConfig\":{\"ValidationType\":{\"Jwt\":{\"JwksSourceSpecifier\":{\"RemoteJwks\":{\"url\":\"http://keycloak.example.com/realms/master/protocol/openid-connect/certs\",\"refresh_interval\":{\"seconds\":10}}}}},\"ScopeValidation\":null}}}}}]}","version":"1.15.14"}
{"level":"error","ts":"2024-03-01T10:22:17.539Z","caller":"jwks/utils.go:24","msg":"failed to fetch JWKS","version":"1.15.14","error":"request failed with code 503 Service Unavailable","stacktrace":"github.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwksWithClient\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/utils/jwks/utils.go:24\ngithub.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwks\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/utils/jwks/utils.go:16\ngithub.com/solo-io/ext-auth-service/pkg/config/oauth/token_validation/jwt/jwks.NewRemoteJwksSource\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/oauth/token_validation/jwt/jwks/remote.go:71\ngithub.com/solo-io/ext-auth-service/pkg/config.(*authServiceFactory).NewOAuth2JwtAccessTokenAuthService\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/factory.go:318\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).authConfigToService\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:310\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).getConfigs\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:103\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).Translate\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:87\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/generator.go:86\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.15.23/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1/ext_auth_discovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.33.0/pkg/api/v1/control-plane/client/client.go:137\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:148\ngithub.com/solo-io/go-utils/contextutils.(*exponentialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.24.6/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:157\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func3\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/server/server.go:160"}
{"level":"error","ts":"2024-03-01T10:22:17.541Z","caller":"config/generator.go:114","msg":"Errors encountered while processing new server configuration","version":"1.15.14","error":"1 error occurred:\n\t* failed to get auth service for auth config with id [gloo-system.oauth-auth]; this configuration will be ignored: failed to fetch JWKS: request failed with code 503 Service Unavailable\n\n","stacktrace":"github.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/generator.go:114\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.15.23/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1/ext_auth_discovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.33.0/pkg/api/v1/control-plane/client/client.go:137\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:148\ngithub.com/solo-io/go-utils/contextutils.(*exponentialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.24.6/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:157\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func3\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/server/server.go:160"}

Note that when you bring down Keycloak after the AuthConfig has already been loaded, the ExtAuth server will start giving errors that it can't refresh, but when you bring Keycloak back up again, it's able to refresh again.

The main question seems to be if we want to accept AuthConfigs that point to non-reachable endpoints. We can't really determine whether the AuthConfig is incorrect, or whether there is an issue with the target endpoint.

DuncanDoyle commented 7 months ago

Reproducer: https://github.com/DuncanDoyle/ge-gloo-7803

nfuden commented 7 months ago

We should have a first time start up version of extauth that forces authconfigs to keep retrying and not fail like they normally would in a case where we are applying new configuration

sync-by-unito[bot] commented 6 months ago

➤ Hanh Vu commented:

ETA of 4/12 for design review. Implementation ETA is unknown.

sheidkamp commented 6 months ago

Outcome of design review was that the ideal approach for handling the situtation where the auth service is updated with a non-responding JWKs URL is to keep the new AuthService in a pending state until it can retrieve the URLs.

This requires changes in how we generate/translate/communicate the new AuthConfigs. The plan is to implement these structural changes in a separate PR and then add the JWKs specific changes on top of that.

This will require 3 rounds of PRs in the main branches:

sheidkamp commented 6 months ago

@DuncanDoyle - The changes needed to implement this are the type of structural changes that we usually don't like to implement in backports.

In this case we are making non-trivial modifications to the ExtAuth pod's xds event loop, and the alternative would involve breaking changes to the exported Generator or Translator interfaces that would normally only accompany a major version update. How big of an ask would it be to make these changes 1.17 only?

@kcbabo - tagging you too while Duncan is on vacation.

sync-by-unito[bot] commented 5 months ago

➤ Nathan F Solo commented:

As there are some interstitial prs hence we are pushing the final due Keith Babo

sheidkamp commented 5 months ago

First solo-projects PR merged (SP1 from https://github.com/solo-io/gloo/issues/7803#issuecomment-2059244877) merged, EXT1 in review

sheidkamp commented 5 months ago

The Ext Auth changes have been merged, functional changes for the last PR are in place, spiffing up the e2e tests.

sheidkamp commented 5 months ago

This has been merged to solo-projects main, all PRs are complete, and the scenario in the reproducer is succeeding.

nfuden commented 5 months ago

Since this materially changes our extauth service's behavior this has been merged to main and will not be backported to 1.15. ThereforeI have removed the 1.15 tag for now.