gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.59k stars 1.76k forks source link

Unclear instructions on how to set up with EKS and AWS auto discover #46634

Open braun1928 opened 1 month ago

braun1928 commented 1 month ago

Setting up a teleport in EKS cluster 1.30 with helm was somewhat easy. I created dynamodb tables for events and config, S3 bucket, IAM role, added this to Access Entry (Pod identity over IRSA, since the SDK is updated), pods are running, apparently all correctly. Used the UI to have the script to create OIDC and role to use the AWS auto discovery features, and added a few policies to be sure. But no matter what, in the UI, when choosing the RDS region, it only throws errors:

rpc error: code = Unknown desc = operation error EC2: DescribeVpcs, get identity: get credentials: failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: xxxxxx-2222-4444-5555-eddeffe00e000, InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements

I don't know where it's coming from, which resource is trying to run this API call, anything, even updating the chart to DEBUG log level I can't figure out what is trying to assume the role and failing. After some attempts, I tried to also deploy the teleport-kube-agent chart -- which by the docs, I have no idea if that is required for this process. And that didn't help at all, only showing The Instance connector is still not available, process-wide services such as session uploading will not function pid:7.1 service/service.go:3066 in the logs.

Unless all my attempts would not work, but I don't see why I couldn't run teleport in EKS and allow it to discover RDS stuff.

braun1928 commented 1 month ago

Info I missed adding yesterday: running teleport 16.2.1, from the corresponding helm chart. EKS has istio, but it's not injecting anything in that namespace. Using Ingress with ACM for the cert, since there's no cert-manager in this cluster. The LB it creates is using the correct certificate.

values.yaml contents, based on one I saw from the docs:

clusterName: teleport.something.redacted
proxyListenerMode: multiplex
chartMode: aws
aws:
  region: us-east-1
  sessionRecordingBucket: redacted-teleport-sessions
  backendTable: prefix-teleport-state
  auditLogTable: prefix-teleport-events
  auditLogMirrorOnStdout: false
  dynamoAutoScaling: false
highAvailability:
  replicaCount: 2

log:
  level: DEBUG

proxy:
  service:
    type: ClusterIP

ingress:
  enabled: true
  spec:
    ingressClassName: alb

annotations:
  ingress:
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/backend-protocol: HTTPS
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=350
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
    alb.ingress.kubernetes.io/success-codes: 200,301,302
# If you are running Kubernetes 1.23 or above, disable PodSecurityPolicies
podSecurityPolicy:
  enabled: false

I tested the kube-agent helm chart with a static DB configuration, to see if that would work at all, but the same Instance connector thing happened. values.yaml as follows:

roles: db
proxyAddr: teleport.something.redacted
authToken: <token created in cli>
databases:
  - name: mongodb
    protocol: mongodb
    uri: mongodb+srv://my-atlas-uri

Logs in DEBUG helped basically nothing. I'm probably going to deploy in regular EC2 instances to see if it all works out, but IMO it should be working fine in EKS, easier and faster than bare instances.