Closed Ati59 closed 5 months ago
This might be a dupe
We have a KCS article on zendesk for this. It is probably the "dupe" ;)
Similar issue: https://github.com/solo-io/gloo/issues/7528
Additional context: https://solo-io-corp.slack.com/archives/C02LQ0JCNLF/p1694578006362599 We support an on demand cache refresh policy (https://docs.solo.io/gloo-edge/latest/reference/api/github.com/solo-io/gloo/projects/gloo/api/v1/enterprise/options/extauth/v1/extauth.proto.sk/#jwksondemandcacherefreshpolicy) in OIDC that could be re-used in this service (would require code changes)
Can still reproduce this on GE 1.15.14. The problem seems to be that when the ExtAuth server loads with an AuthConfig that points to a JWKS endpoint that is not available/reachable, the config does not get loaded. And when the config does not get loaded, the refreshInterval
does not get triggered at all. So the only way to get out of that situation is to reload the AuthConfig or restart the ExtAuth server.
{"level":"info","ts":"2024-03-01T10:22:17.510Z","caller":"runner/xds.go:115","msg":"{\"auth_config_ref_name\":\"gloo-system.oauth-auth\",\"configs\":[{\"AuthConfig\":{\"Oauth2\":{\"OauthType\":{\"AccessTokenValidationConfig\":{\"ValidationType\":{\"Jwt\":{\"JwksSourceSpecifier\":{\"RemoteJwks\":{\"url\":\"http://keycloak.example.com/realms/master/protocol/openid-connect/certs\",\"refresh_interval\":{\"seconds\":10}}}}},\"ScopeValidation\":null}}}}}]}","version":"1.15.14"}
{"level":"error","ts":"2024-03-01T10:22:17.539Z","caller":"jwks/utils.go:24","msg":"failed to fetch JWKS","version":"1.15.14","error":"request failed with code 503 Service Unavailable","stacktrace":"github.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwksWithClient\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/utils/jwks/utils.go:24\ngithub.com/solo-io/ext-auth-service/pkg/config/utils/jwks.FetchJwks\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/utils/jwks/utils.go:16\ngithub.com/solo-io/ext-auth-service/pkg/config/oauth/token_validation/jwt/jwks.NewRemoteJwksSource\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/oauth/token_validation/jwt/jwks/remote.go:71\ngithub.com/solo-io/ext-auth-service/pkg/config.(*authServiceFactory).NewOAuth2JwtAccessTokenAuthService\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/config/factory.go:318\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).authConfigToService\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:310\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).getConfigs\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:103\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*extAuthConfigTranslator).Translate\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/translator.go:87\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/generator.go:86\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.15.23/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1/ext_auth_discovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.33.0/pkg/api/v1/control-plane/client/client.go:137\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:148\ngithub.com/solo-io/go-utils/contextutils.(*exponentialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.24.6/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:157\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func3\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/server/server.go:160"}
{"level":"error","ts":"2024-03-01T10:22:17.541Z","caller":"config/generator.go:114","msg":"Errors encountered while processing new server configuration","version":"1.15.14","error":"1 error occurred:\n\t* failed to get auth service for auth config with id [gloo-system.oauth-auth]; this configuration will be ignored: failed to fetch JWKS: request failed with code 503 Service Unavailable\n\n","stacktrace":"github.com/solo-io/solo-projects/projects/extauth/pkg/config.(*configGenerator).GenerateConfig\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/config/generator.go:114\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1.1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:121\ngithub.com/solo-io/gloo/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1.applyExtAuthConfig.func1\n\t/go/pkg/mod/github.com/solo-io/gloo@v1.15.23/projects/gloo/pkg/api/v1/enterprise/options/extauth/v1/ext_auth_discovery_service_xds.sk.go:111\ngithub.com/solo-io/solo-kit/pkg/api/v1/control-plane/client.(*client).Start\n\t/go/pkg/mod/github.com/solo-io/solo-kit@v0.33.0/pkg/api/v1/control-plane/client/client.go:137\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run.func1\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:148\ngithub.com/solo-io/go-utils/contextutils.(*exponentialBackoff).Backoff\n\t/go/pkg/mod/github.com/solo-io/go-utils@v0.24.6/contextutils/backoff.go:70\ngithub.com/solo-io/solo-projects/projects/extauth/pkg/runner.(*configSource).Run\n\t/go/src/github.com/solo-io/solo-projects/projects/extauth/pkg/runner/xds.go:157\ngithub.com/solo-io/ext-auth-service/pkg/server.Server.Run.func3\n\t/go/pkg/mod/github.com/solo-io/ext-auth-service@v0.44.0-patch2/pkg/server/server.go:160"}
Note that when you bring down Keycloak after the AuthConfig has already been loaded, the ExtAuth server will start giving errors that it can't refresh, but when you bring Keycloak back up again, it's able to refresh again.
The main question seems to be if we want to accept AuthConfigs that point to non-reachable endpoints. We can't really determine whether the AuthConfig is incorrect, or whether there is an issue with the target endpoint.
Reproducer: https://github.com/DuncanDoyle/ge-gloo-7803
We should have a first time start up version of extauth that forces authconfigs to keep retrying and not fail like they normally would in a case where we are applying new configuration
➤ Hanh Vu commented:
ETA of 4/12 for design review. Implementation ETA is unknown.
Outcome of design review was that the ideal approach for handling the situtation where the auth service is updated with a non-responding JWKs URL is to keep the new AuthService in a pending state until it can retrieve the URLs.
This requires changes in how we generate/translate/communicate the new AuthConfigs. The plan is to implement these structural changes in a separate PR and then add the JWKs specific changes on top of that.
This will require 3 rounds of PRs in the main branches:
@DuncanDoyle - The changes needed to implement this are the type of structural changes that we usually don't like to implement in backports.
In this case we are making non-trivial modifications to the ExtAuth pod's xds event loop, and the alternative would involve breaking changes to the exported Generator or Translator interfaces that would normally only accompany a major version update. How big of an ask would it be to make these changes 1.17 only?
@kcbabo - tagging you too while Duncan is on vacation.
➤ Nathan F Solo commented:
As there are some interstitial prs hence we are pushing the final due Keith Babo
First solo-projects PR merged (SP1 from https://github.com/solo-io/gloo/issues/7803#issuecomment-2059244877) merged, EXT1 in review
The Ext Auth changes have been merged, functional changes for the last PR are in place, spiffing up the e2e tests.
This has been merged to solo-projects
main, all PRs are complete, and the scenario in the reproducer is succeeding.
Since this materially changes our extauth service's behavior this has been merged to main and will not be backported to 1.15. ThereforeI have removed the 1.15 tag for now.
Gloo Edge Version
1.13.x (latest stable)
Kubernetes Version
None
Describe the bug
If the
RemoteJwks
url is not reachable at theextauth
service launch, it will throw an errorfailed to fetch JWKS
but will never retried to get the the jwks even ifrefreshInterval
is set.Then every call through a route using the
AuthConfig
will get a403
error.On
ext-auth
logs :On
gateway-proxy
logs :Steps to reproduce the bug
AuthConfig
as-is :And a VirtualService that is using it :
failed to fetch JWKS
error on ext-auth logsrefreshInterval
UAEX
errors ongateway-proxy
pod andAuth Server does not contain auth configuration with the given ID
errors on ext-auth one (6. If you make keycloak available again and then restart theext-auth
pod, it will fix the issue)Expected Behavior
Ext-auth pod should retry to get the jwks based on
refreshInterval
value so we can get through the authentication process and end up with200
without having to restart the ext-auth pod. When the first try is faulty it looks like the "refresh loop" is not launch at all.Additional Context
No response
Related Issues
┆Issue is synchronized with this Asana task by Unito