gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.33k stars 1.74k forks source link

AWS Redis ElastiCache 7.0.7 Engine Version failing to connect: ERROR: signal: aborted (core dumped) #33564

Closed dmitry-mightydevops closed 9 months ago

dmitry-mightydevops commented 11 months ago

Expected behavior:

tsh db connect to work properly with elasticache.

Current behavior:

It fails with:

➜ tsh db connect project-prod-backend-redis-elasticache-us-east-1-1112223334445
redis-cli: redis-cli.c:585: cliAddArgument: Assertion `flags->element[j]->type == REDIS_REPLY_STATUS' failed.
ERROR: signal: aborted (core dumped)

full set of ops:

➜ tsh login --proxy=teleport.projectapp.net:443                               

> Profile URL:        https://teleport.projectapp.net:443
  Logged in as:       dmitry-mightydevops
  Cluster:            teleport.projectapp.net
  Roles:              administrators
  Logins:             administrators
  Kubernetes:         enabled
  Kubernetes groups:  system:masters
  Valid until:        2023-10-17 20:55:56 +0700 +07 [valid for 3h27m0s]
  Extensions:         login-ip, permit-port-forwarding, permit-pty, private-key-policy

✗  tsh db ls Name=project-prod-backend-redis
Name                              Description                                         Allowed Users Labels                                                                                                                                             Connect 
--------------------------------- --------------------------------------------------- ------------- -------------------------------------------------------------------------------------------------------------------------------------------------- ------- 
project-prod-backend-redis        ElastiCache cluster in us-east-1 (primary endpoint) [*]           Name=project-prod-backend-redis,account-id=1112223334445,component=backend,created_at=06/12/2023,created_by=DmitrySemenov,endpoint-type=primary,...         
project-prod-backend-redis-reader ElastiCache cluster in us-east-1 (reader endpoint)  [*]           Name=project-prod-backend-redis,account-id=1112223334445,component=backend,created_at=06/12/2023,created_by=DmitrySemenov,endpoint-type=reader,e...         

took 4s 
➜ tsh db login --db-user projectuser-iam project-prod-backend-redis                          
Enter an OTP code from a device:
Connection information for database "project-prod-backend-redis-elasticache-us-east-1-1112223334445" has been saved.

You can now connect to it using the following command:

  tsh db connect project-prod-backend-redis-elasticache-us-east-1-1112223334445

You can view the connect command for the native database CLI client:

  tsh db config --format=cmd project-prod-backend-redis-elasticache-us-east-1-1112223334445

took 14s 
➜ tsh db connect project-prod-backend-redis-elasticache-us-east-1-1112223334445
redis-cli: redis-cli.c:585: cliAddArgument: Assertion `flags->element[j]->type == REDIS_REPLY_STATUS' failed.
ERROR: signal: aborted (core dumped)

Bug details:

➜ teleport version                                                            
Teleport v14.0.3 git:api/v14.0.3-0-g4d1c2ac go1.21.3

➜ k ice image           
PODNAME                         CONTAINER         PULL          IMAGE
teleport-0                      teleport          IfNotPresent  public.ecr.aws/gravitational/teleport-distroless:14.0.2
teleport-auth-6c8d4d7d49-t2rm9  teleport          IfNotPresent  public.ecr.aws/gravitational/teleport-distroless:14.0.2
teleport-auth-6c8d4d7d49-t2rm9  operator          IfNotPresent  public.ecr.aws/gravitational/teleport-operator:14.0.2
teleport-proxy-6497d88f87-n4nt8 wait-auth-update  IfNotPresent  public.ecr.aws/gravitational/teleport-distroless:14.0.2
teleport-proxy-6497d88f87-n4nt8 teleport          IfNotPresent  public.ecr.aws/gravitational/teleport-distroless:14.0.2

IAM user is created via terraform:

resource "random_password" "auth_token" {
  length           = 20
  special          = false
  override_special = "!-_=+"

  keepers = {
    elasticache_instance = local.labels.elasticache_cluster_name
  }
}

resource "aws_elasticache_user" "user" {
  user_id       = local.labels.elasticache_user
  user_name     = local.labels.elasticache_user
  access_string = "on ~* +@all"
  engine        = upper(var.engine)

  authentication_mode {
    type      = "password"
    passwords = [random_password.auth_token.result]
  }
}

resource "aws_elasticache_user" "iam_user" {
  user_id       = local.labels.elasticache_iam_user
  user_name     = local.labels.elasticache_iam_user
  access_string = "on ~* +@all"
  engine        = upper(var.engine)

  authentication_mode {
    type = "iam"
  }
}

resource "aws_elasticache_user_group" "users" {
  user_group_id = local.labels.elasticache_user_group
  user_ids = [
    aws_elasticache_user.user.user_id,
    aws_elasticache_user.iam_user.user_id,
    "default"
  ]
  engine = upper(var.engine)
}
dmitry-mightydevops commented 11 months ago

Same issue with redis-cli 7.2.1

➜ redis-cli --version
redis-cli 7.2.1 (git:a38b05a8-dirty)

➜ tsh db connect project-prod-backend-redis
MFA is required to access database "project-prod-backend-redis"
Enter an OTP code from a device:
redis-cli: redis-cli.c:568: cliAddCommandDocArg: Assertion `flags->element[j]->type == REDIS_REPLY_STATUS' failed.
ERROR: signal: aborted (core dumped)
greedy52 commented 10 months ago

@dmitry-mightydevops I cannot repro this in my setup (ElastiCache 7.0.7, redis-cli 7.0.11/7.2.2).

Could you verify the version of the Database Service/agent? What's the output of tctl get db_server/project-prod-backend-redis?

For reference: https://github.com/gravitational/teleport/issues/19240

greedy52 commented 10 months ago

@dmitry-mightydevops any update? Thanks!

dmitry-mightydevops commented 10 months ago

@greedy52

here you go:

 tctl get db_server/project-prod-backend-redis
kind: db_server
metadata:
  expires: "2023-11-17T19:47:55Z"
  id: 1699465955700391216
  name: project-prod-backend-redis-elasticache-us-east-1-111111
spec:
  database:
    kind: db
    metadata:
      description: ElastiCache cluster in us-east-1 (primary endpoint)
      labels:
        Name: project-prod-backend-redis
        account-id: "111111"
        component: backend
        created_at: 06/12/2023
        created_by: DmitrySemenov
        endpoint-type: primary
        engine-version: 7.0.7
        environment: prod        
        project: project
        region: us-east-1
        teleport.dev/cloud: AWS
        teleport.dev/origin: cloud
        teleport.internal/discovered-name: project-prod-backend-redis
        terraform: "true"
      name: project-prod-backend-redis-elasticache-us-east-1-111111
    spec:
      ad:
        domain: ""
        spn: ""
      aws:
        account_id: "111111"
        elasticache:
          endpoint_type: primary
          replication_group_id: project-prod-backend-redis
          transit_encryption_enabled: true
          user_group_ids:
          - projectuser-group
        iam_policy_status: IAM_POLICY_STATUS_UNSPECIFIED
        memorydb: {}
        opensearch: {}
        rds:
          iam_auth: false
        rdsproxy: {}
        redshift: {}
        redshift_serverless: {}
        region: us-east-1
        secret_store: {}
      azure:
        redis: {}
      gcp: {}
      mongo_atlas: {}
      mysql: {}
      oracle:
        audit_user: ""
      protocol: redis
      tls:
        mode: 0
      uri: master.project-prod-backend-redis.bbh62t.use1.cache.amazonaws.com:6379
    status:
      aws:
        account_id: "111111"
        elasticache:
          endpoint_type: primary
          replication_group_id: project-prod-backend-redis
          transit_encryption_enabled: true
          user_group_ids:
          - projectuser-group
        iam_policy_status: IAM_POLICY_STATUS_FAILED
        memorydb: {}
        opensearch: {}
        rds:
          iam_auth: false
        rdsproxy: {}
        redshift: {}
        redshift_serverless: {}
        region: us-east-1
        secret_store: {}
      azure:
        redis: {}
      ca_cert: |
        -----BEGIN CERTIFICATE-----
        MIIDQTCCAimgAwIBAgITBmyfz5m/jAo54vB4ikPmljZbyjANBgkqhkiG9w0BAQsF
        ....
        -----END CERTIFICATE-----
      mysql: {}
    version: v3
  host_id: f03526de-dc86-49b2-99f3-f1c798261484
  hostname: teleport-0
  rotation:
    current_id: ""
    last_rotated: "0001-01-01T00:00:00Z"
    schedule:
      standby: "0001-01-01T00:00:00Z"
      update_clients: "0001-01-01T00:00:00Z"
      update_servers: "0001-01-01T00:00:00Z"
    started: "0001-01-01T00:00:00Z"
  version: 14.1.0
version: v3

and the error:

➜ tsh db connect project-prod-backend-redis    
MFA is required to access database "project-prod-backend-redis-elasticache-us-east-1-111111"
Enter an OTP code from a device:
redis-cli: redis-cli.c:585: cliAddArgument: Assertion `flags->element[j]->type == REDIS_REPLY_STATUS' failed.
ERROR: signal: aborted (core dumped)
greedy52 commented 10 months ago

This is a regression. The previous fix was reverted during https://github.com/gravitational/teleport/pull/30294.

Will make a new fix next week.

In the meantime, add -command to your access string to avoid sending a response to COMMAND DOCS by redis-cli, e.g.:

on ~* +@all -command
dmitry-mightydevops commented 10 months ago

@greedy52 thank you! Pls let me know when ready and I will test. What was information in the tctl output that "made it clear" it was an error on the teleport side?

dmitry-mightydevops commented 9 months ago

@greedy52 is the fix released? If so what teleport version?

greedy52 commented 9 months ago

@dmitry-mightydevops The issue was automatically closed when the fix got merged to master. It's not released yet. The backport to v14 release is https://github.com/gravitational/teleport/pull/35162. I will update here once it got released

dmitry-mightydevops commented 9 months ago

Thank you!

greedy52 commented 9 months ago

The fix is now released at https://github.com/gravitational/teleport/releases/tag/v14.2.1. Note that the Teleport server side (Database Service) has to be updated. Client-side (tsh) update is not required.