hashicorp / vault-plugin-auth-jwt

A Vault plugin to allow authentication via JWT (and OIDC) tokens
Mozilla Public License 2.0
100 stars 61 forks source link

JWT validation fails after EKS OIDC provider's key rotation #181

Open nomatterz opened 3 years ago

nomatterz commented 3 years ago

I'm using AWS EKS(1.21) with external Vault(1.7.2). I've configured jwt auth method in Vault:

$VAULT_BINARY_DIR/vault auth enable -path=${ config.vault_path } jwt" 

$VAULT_BINARY_DIR/vault policy write "${ config.cluster_name }" -<<EOF
path "secret/clusters/${ config.cluster_name }/*" {
  capabilities = ["create", "read", "list"]
}
EOF

$VAULT_BINARY_DIR/vault write \
  auth/${ config.vault_path }/role/${ config.vault_role } \
  role_type=jwt \
  bound_audiences=${ config.bound_audiences } \
  user_claim=sub \
  ttl=24h \
  policies="${ config.cluster_name }"

$VAULT_BINARY_DIR/vault write auth/${ config.vault_path }/config \
        oidc_discovery_url=${ config. oidc_url } \
        bound_issuer=${ config.oidc_url }

Everything works fine until OIDC provider rotates keys. after rotation i'm getting error while login to Vault

400. Errors: error validating token: error verifying token signature: failed to verify id token signature"

After disabling and reconfiguring (with exactly the same parameters) auth method in Vault this is fixed immediately.

Is there a way to handle this key rotation automatically by auth-jwt plugin itself without manual reconfiguration?

austingebauer commented 3 years ago

Hi, @nomatterz. Thanks for reporting this issue. The auth-jwt plugin should handle key rotation without problem. The JWT validation code uses https://github.com/coreos/go-oidc/blob/v3/oidc/jwks.go#L97-L131 under the hood, which refreshes its cached keys from the remote JWKS if it cannot find a kid match. You can see that the log line you provided is coming from this line, where the remote JWKS was just fetched. Do you happen to know if the JWT has more than one signature?

It would be helpful if you could provide any other important details related to AWS EKS and how the auth-jwt plugin is being used in the environment. That would be helpful to try to reproduce the issue. Thanks!

nomatterz commented 3 years ago

Hi @austingebauer Thank you for your attention. Each AWS EKS cluster can have associated OIDC provider. AWS do not provide a lot details about it (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html) or i'm not aware. Each service account in kubernetes uses token with this OIDC provider as issuer:

"alg":"RS256","kid":"45fcc2bf6e85be963978f53fedfefbf55a34cbd2"
---
{
  "aud": [
    "https://kubernetes.default.svc"
  ],
  "exp": 1664897947,
  "iat": 1633361947,
  "iss": "https://oidc.eks.<aws_region>.amazonaws.com/id/<id_here>",
  "kubernetes.io": {
    "namespace": "test-vault",
    "pod": {
      "name": "test-vault-jwt-67bbd464d7-nk5z2",
      "uid": "389768dc-8077-445d-928e-50e888348182"
    },
    "serviceaccount": {
      "name": "default",
      "uid": "7ec38762-b7a1-47c8-a36d-ffb4694c8755"
    },
    "warnafter": 1633365554
  },
  "nbf": 1633361947,
  "sub": "system:serviceaccount:test-vault:default"
}

I've configured jwt auth method in vault for specific eks cluster:

root@vault ~]# vault read auth/testing/role/testing 
Key                        Value
---                        -----
allowed_redirect_uris      <nil>
bound_audiences            [https://kubernetes.default.svc]
bound_claims               <nil>
bound_claims_type          string
bound_subject              n/a
claim_mappings             <nil>
clock_skew_leeway          0
expiration_leeway          0
groups_claim               n/a
max_age                    0
not_before_leeway          0
oidc_scopes                <nil>
policies                   [testing general]
role_type                  jwt
token_bound_cidrs          []
token_explicit_max_ttl     0s
token_max_ttl              0s
token_no_default_policy    false
token_num_uses             0
token_period               0s
token_policies             [testing general]
token_ttl                  24h
token_type                 default
ttl                        24h
user_claim                 sub
verbose_oidc_logging       true

root@vault ~]# vault read auth/testing/config
Key                       Value
---                       -----
bound_issuer              https://oidc.eks.<aws_region>.amazonaws.com/id/<id_here>
default_role              n/a
jwks_ca_pem               n/a
jwks_url                  n/a
jwt_supported_algs        []
jwt_validation_pubkeys    []
namespace_in_state        true
oidc_client_id            n/a
oidc_discovery_ca_pem     n/a
oidc_discovery_url        https://oidc.eks.<aws_region>.amazonaws.com/id/<id_here>
oidc_response_mode        n/a
oidc_response_types       []
provider_config           map[]

I'm using banzaicloud mutating webhook for authenticating to vault, getting token and fetching secrets (https://banzaicloud.com/docs/bank-vaults/mutating-webhook/) But that's not so important because i also do tests with vault client ( execing into kubernetes pod) and this works ok:

export JWT=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
./vault.exe write -tls-skip-verify auth/testing/login role='testing' jwt="$JWT"
Key                  Value
---                  -----
token                <token>
token_accessor       <token_accessor>
token_duration       24h
token_renewable      true
token_policies       ["testing" "default" "general"]
identity_policies    []
policies             ["testing" "default" "general"]
token_meta_role      testing

after initial configuration everything works fine. But after some time (I suspect OIDC provider rotates keys) i receive error

400. Errors: error validating token: error verifying token signature: failed to verify id token signature"

With no logs output regarding this fail in vault itself.

After auth method disabling and reconfiguration everything works fine again.