gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.33k stars 1.74k forks source link

Kubernetes `static_jwks` joining requires configuring with a different JWKS when on EKS with IAM OIDC provider enabled #37183

Open strideynet opened 7 months ago

strideynet commented 7 months ago

Applies To

https://goteleport.com/docs/machine-id/deployment/kubernetes/

Details

We currently tell users to run curl http://localhost:8080/openid/v1/jwks to fetch the JWKS for their cluster. This works on most clusters, but AWS EKS clusters with IAM OIDC provider enabled, this leads to an error as follows when joining:

Original Error: *trace.RawTrace reviewing kubernetes token with static_jwks
    validating jwt signature
        go-jose/go-jose: unsupported key type/format
Stack Trace:

To resolve, they must instead use the JWKS from https://oidc.eks.REGION.amazonaws.com/id/CLUSTER_ID/keys - this is because the JWKS returned by the Kubernetes API server are not the ones actually used when signing a Service Account JWT, instead credentials from AWS IAM are used.

We should make sure we add guidance that covers this special case - and potentially a troubleshooting log entry since this is quite subtle. I'll see if I can work out a way of doing this that is more general and works across all cases.

Reproduction ClusterConfig:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: foo
  region: us-east-1
  version: "1.28"

iam:
  withOIDC: true

addons:
  - name: aws-ebs-csi-driver
    attachPolicyARNs:
      - arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy

managedNodeGroups:
  - name: foo-ng
    instanceType: m5.4xlarge
    minSize: 1
    maxSize: 40

How will we know this is resolved?

When following the steps within the guide on an EKS cluster with the IAM OIDC provider enabled succeeds.

Related Issues

tigrato commented 7 months ago

We also need to check if EKS Pod Indentity is also be affected by this https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html

bothra90 commented 6 months ago

To resolve, they must instead use the JWKS from https://oidc.eks.region.amazonaws.com/id/CLUSTER_ID/keys - this is because the JWKS returned by the Kubernetes API server are not the ones actually used when signing a Service Account JWT, instead credentials from AWS IAM are used.

We've discovered that this is unfortunately not sufficient. EKS rotates the OIDC keys every 7 days (see: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html). That means the jwks in the token need to be updated every time the OIDC private key rotates from underneath it. I've so far not been able to find a way to avoid this.

bothra90 commented 6 months ago

FWIW, if we could specify a url to fetch the keys from instead of the static list of keys, it would resolve this issue. This is what EKS recommends as well.

strideynet commented 6 months ago

We've discovered that this is unfortunately not sufficient. EKS rotates the OIDC keys every 7 days (see: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html). That means the jwks in the token need to be updated every time the OIDC private key rotates from underneath it. I've so far not been able to find a way to avoid this.

Achk, that's certainly a problem.

FWIW, if we could specify a url to fetch the keys from instead of the static list of keys, it would resolve this issue. This is what EKS recommends as well.

This makes sense to me. When we added the static_jwks subtype, we were initially thinking about cases where the Teleport auth server cannot reach the issuer, we can likely add a jwks_uri subtype for cases like this. Do you want to go ahead and raise a feature request for this ?

bothra90 commented 5 months ago

@strideynet: I created the feature request like you asked, but curious if/how it's going to be prioritized. Our team is also happy to contribute a PR if that would help and if you're willing to mentor/review.

strideynet commented 5 months ago

@bothra90

but curious if/how it's going to be prioritized.

Unfortunately, I don't have a timeline on this yet. It's not likely a huge priority since IAM joining is a workaround in these environments.

Our team is also happy to contribute a PR if that would help and if you're willing to mentor/review.

More than happy to mentor a PR on this. You can reach me at noah @ goteleport.com - or reach out to me on the slack.

pksukumar commented 3 months ago

@strideynet

It's not likely a huge priority since IAM joining is a workaround in these environments.

This looks like a priority item for us. Because we are trying to install a standalone teleport operator and it use same principle of kubernetes join method using static_jwks. https://goteleport.com/docs/management/dynamic-resources/teleport-operator-standalone/#step-24-create-the-operator-join-token

We want to install teleport operator in standalone mode as installing operator using teleport chart reports following issue, if we enable multiple replicas of teleport for HA, https://github.com/gravitational/teleport/issues/21826

Now we are not able to install operator either in standalone mode nor as a sidecar. Please look into this on priority.

Thanks, Sukumar

GavinFrazar commented 1 month ago

@strideynet

It's not likely a huge priority since IAM joining is a workaround in these environments.

This looks like a priority item for us. Because we are trying to install a standalone teleport operator and it use same principle of kubernetes join method using static_jwks. https://goteleport.com/docs/management/dynamic-resources/teleport-operator-standalone/#step-24-create-the-operator-join-token

We want to install teleport operator in standalone mode as installing operator using teleport chart reports following issue, if we enable multiple replicas of teleport for HA, #21826

Now we are not able to install operator either in standalone mode nor as a sidecar. Please look into this on priority.

Thanks, Sukumar

It sounds like you are self-hosted, yes? If your Teleport cluster is v12+ you can use the in_cluster kubernetes join method instead of static jwks: https://goteleport.com/docs/reference/join-methods/#kubernetes-in-cluster

And in v15+ of the teleport cluster helm chart we have decoupled the operator from the HA deployments, so you wouldn't need to deploy it separately.

aprohorov-callsign commented 1 month ago

So many workarounds instead of one solution =) Just saying... Faced with the same unpleasant issue.