gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.57k stars 1.76k forks source link

Can't connect to a ECS container #23041

Closed GlauberrBatista closed 1 year ago

GlauberrBatista commented 1 year ago

Expected behavior: To run tsh aws ecs execute-command --cluster <cluster_name> --task $(tsh aws ecs list-tasks --cluster <cluster_name> --service-name <service_name> --output text --query "taskArns[0]") --interactive --command "/bin/bash" and be able to open a SSM session inside a container on ECS.

Current behavior:

$ tsh aws ecs execute-command --cluster <cluster_name> --task $(tsh aws ecs list-tasks --cluster <cluster_name> --service-name <service_name> --output text --query "taskArns[0]") --interactive --command "/bin/bash"

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.

Starting session with SessionId: ecs-execute-command-<command_id>
2023/03/14 09:27:04 http: TLS handshake error from 127.0.0.1:57218: remote error: tls: bad certificate
2023/03/14 09:27:04 http: TLS handshake error from 127.0.0.1:57220: remote error: tls: bad certificate
2023/03/14 09:27:04 http: TLS handshake error from 127.0.0.1:57222: remote error: tls: bad certificate
2023/03/14 09:27:04 http: TLS handshake error from 127.0.0.1:57224: remote error: tls: bad certificate
2023/03/14 09:27:05 http: TLS handshake error from 127.0.0.1:57226: remote error: tls: bad certificate
2023/03/14 09:27:06 http: TLS handshake error from 127.0.0.1:57228: remote error: tls: bad certificate
2023/03/14 09:27:08 http: TLS handshake error from 127.0.0.1:57230: remote error: tls: bad certificate

Bug details:

Server logs

2023-03-14T13:26:49Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:215368069946933065539208652163541536313) auth/middleware.go:674
2023-03-14T13:26:49Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:94962230939463013271192222214477524278) auth/middleware.go:674
2023-03-14T13:26:49Z DEBU [PROXY:SER] Dialing from: "@web-proxy" to: "@local-node". trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:252
2023-03-14T13:26:49Z DEBU [PROXY:SER] Tunnel dialing to 37664fe4-fc09-480d-9772-4cab2bca0313.<teleport_domain>. trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:373
2023-03-14T13:26:49Z DEBU [PROXY:SER] Connecting to <teleport_ip>:36144 through tunnel. trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:707
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Transport request: teleport-transport. leaseID:1 target:<teleport_domain>:3024 cluster:<teleport_domain> reversetunnel/agent.go:570
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Received out-of-band proxy transport request for @local-node [37664fe4-fc09-480d-9772-4cab2bca0313.<teleport_domain>]. cluster:<teleport_domain> reversetunnel/transport.go:199
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Handing off connection to a local "app" service. cluster:<teleport_domain> reversetunnel/transport.go:274
2023-03-14T13:26:49Z DEBU [PROXY:SER] Succeeded dialing from: "@web-proxy" to: "@local-node". trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:258
2023-03-14T13:26:49Z DEBU [PROXY:SER] Dialing from: "@web-proxy" to: "@local-node". trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:252
2023-03-14T13:26:49Z DEBU [PROXY:SER] Tunnel dialing to 37664fe4-fc09-480d-9772-4cab2bca0313.<teleport_domain>. trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:373
2023-03-14T13:26:49Z DEBU [PROXY:SER] Connecting to <teleport_ip>:36144 through tunnel. trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:707
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Transport request: teleport-transport. leaseID:1 target:<teleport_domain>:3024 cluster:<teleport_domain> reversetunnel/agent.go:570
2023-03-14T13:26:49Z WARN [APP:SERVI] Failed to handle client connection. error:[
ERROR REPORT:
Original Error: *errors.errorString EOF
Stack Trace:
        github.com/gravitational/teleport/lib/srv/app/server.go:893 github.com/gravitational/teleport/lib/srv/app.(*Server).getConnectionInfo
        github.com/gravitational/teleport/lib/srv/app/server.go:690 github.com/gravitational/teleport/lib/srv/app.(*Server).handleConnection
        github.com/gravitational/teleport/lib/srv/app/server.go:657 github.com/gravitational/teleport/lib/srv/app.(*Server).HandleConnection
        github.com/gravitational/teleport/lib/reversetunnel/transport.go:275 github.com/gravitational/teleport/lib/reversetunnel.(*transport).start
        github.com/gravitational/teleport/lib/reversetunnel/agent.go:581 github.com/gravitational/teleport/lib/reversetunnel.(*agent).handleDrainChannels.func2
        runtime/asm_amd64.s:1598 runtime.goexit
User Message: TLS handshake failed
        EOF] app/server.go:664
2023-03-14T13:26:49Z WARN [APP:SERVI] Failed to close client connection. error:[
ERROR REPORT:
Original Error: trace.aggregate EOF
Stack Trace:
        github.com/gravitational/teleport/api@v0.0.0/utils/sshutils/chconn.go:113 github.com/gravitational/teleport/api/utils/sshutils.(*ChConn).Close
        github.com/gravitational/teleport/lib/srv/app/server.go:665 github.com/gravitational/teleport/lib/srv/app.(*Server).HandleConnection
        github.com/gravitational/teleport/lib/reversetunnel/transport.go:275 github.com/gravitational/teleport/lib/reversetunnel.(*transport).start
        github.com/gravitational/teleport/lib/reversetunnel/agent.go:581 github.com/gravitational/teleport/lib/reversetunnel.(*agent).handleDrainChannels.func2
        runtime/asm_amd64.s:1598 runtime.goexit
User Message: EOF] app/server.go:666
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Received out-of-band proxy transport request for @local-node [37664fe4-fc09-480d-9772-4cab2bca0313.<teleport_domain>]. cluster:<teleport_domain> reversetunnel/transport.go:199
2023-03-14T13:26:49Z DEBU [PROXY:AGE] Handing off connection to a local "app" service. cluster:<teleport_domain> reversetunnel/transport.go:274
2023-03-14T13:26:49Z DEBU [PROXY:SER] Succeeded dialing from: "@web-proxy" to: "@local-node". trace.fields:map[cluster:<teleport_domain>] reversetunnel/localsite.go:258
2023-03-14T13:26:49Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:215368069946933065539208652163541536313) auth/middleware.go:674
2023-03-14T13:26:49Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:94962230939463013271192222214477524278) auth/middleware.go:674
2023-03-14T13:26:49Z DEBU [APP:SERVI] Created app session chunk 0b648167-e95c-49cb-acfe-4e4da545ab65 app/session.go:107
2023-03-14T13:26:49Z DEBU [APP:SERVI] Creating tracker for session chunk 0b648167-e95c-49cb-acfe-4e4da545ab65 app/session.go:344
2023-03-14T13:26:49Z DEBU [APP:SERVI] Using async streamer for session chunk 0b648167-e95c-49cb-acfe-4e4da545ab65. app/session.go:314
2023-03-14T13:26:49Z INFO [AUDIT]     app.session.chunk app_name:aws app_public_addr:aws.<teleport_domain> app_uri:https://console.aws.amazon.com/ec2/v2/home aws_role_arn:arn:aws:iam::<aws_account>:role/OwnerRole cluster_name:<teleport_domain> code:T2008I ei:0 event:app.session.chunk namespace:default server_id:37664fe4-fc09-480d-9772-4cab2bca0313 session_chunk_id:0b648167-e95c-49cb-acfe-4e4da545ab65 sid:b3678e62-2796-4b7b-9132-de443a24e590 time:2023-03-14T13:26:49.735Z uid:f0972ceb-b781-42be-8d82-77fee8c9746e user:GlauberrBatista events/emitter.go:263
2023-03-14T13:26:49Z INFO [AUDIT]     app.session.chunk app_name:aws app_public_addr:aws.<teleport_domain> app_uri:https://console.aws.amazon.com/ec2/v2/home aws_role_arn:arn:aws:iam::<aws_account>:role/OwnerRole cluster_name:<teleport_domain> code:T2008I ei:0 event:app.session.chunk namespace:default server_id:37664fe4-fc09-480d-9772-4cab2bca0313 session_chunk_id:0b648167-e95c-49cb-acfe-4e4da545ab65 sid:b3678e62-2796-4b7b-9132-de443a24e590 time:2023-03-14T13:26:49.735Z uid:f0972ceb-b781-42be-8d82-77fee8c9746e user:GlauberrBatista events/emitter.go:263
2023-03-14T13:26:49Z INFO [APP:WEB]   Round trip: POST /, code: 200, duration: 105.71963ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:<teleport_domain> forward/fwd.go:182
2023-03-14T13:26:50Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:215368069946933065539208652163541536313) auth/middleware.go:674
2023-03-14T13:26:50Z DEBU [AUTH]      ClientCertPool -> cert(<teleport_domain> issued by <teleport_domain>:94962230939463013271192222214477524278) auth/middleware.go:674
2023-03-14T13:26:51Z INFO [APP:WEB]   Round trip: POST /, code: 200, duration: 401.835727ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:<teleport_domain> forward/fwd.go:182
2023-03-14T13:26:51Z INFO [APP:WEB]   Round trip: POST /, code: 200, duration: 37.823901ms tls:version: 304, tls:resume:false, tls:csuite:1301, tls:server:<teleport_domain> forward/fwd.go:182
GlauberrBatista commented 1 year ago

I upgraded to version 12 and later to version 13. The issue is still happening

Extra logs from tsh client:

2023-06-19T10:13:02-03:00 DEBU             Started forwarding request for "ecs.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023-06-19T10:13:02-03:00 INFO [CA]        Generating TLS certificate SERIALNUMBER=27943757550190476698131196842897322965,CN=ecs.us-east-1.amazonaws.com,O=Teleport dns_names:[ecs.us-east-1.amazonaws.com] key_usage:5 not_after:2023-06-19 23:19:01 +0000 UTC tlsca/ca.go:1111
2023-06-19T10:13:03-03:00 DEBU             Stopped forwarding request for "ecs.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:363
2023-06-19T10:13:03-03:00 DEBU             Started forwarding request for "ecs.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023-06-19T10:13:04-03:00 DEBU             Started forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023-06-19T10:13:04-03:00 INFO [CA]        Generating TLS certificate SERIALNUMBER=27943757550190476698131196842897322965,CN=ssmmessages.us-east-1.amazonaws.com,O=Teleport dns_names:[ssmmessages.us-east-1.amazonaws.com] key_usage:5 not_after:2023-06-19 23:19:01 +0000 UTC tlsca/ca.go:1111
2023/06/19 10:13:04 http: TLS handshake error from 127.0.0.1:57291: remote error: tls: bad certificate
2023-06-19T10:13:04-03:00 DEBU             Stopped forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:363
2023-06-19T10:13:04-03:00 DEBU             Started forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023-06-19T10:13:04-03:00 DEBU             Stopped forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:363
2023/06/19 10:13:04 http: TLS handshake error from 127.0.0.1:57293: remote error: tls: bad certificate
2023-06-19T10:13:05-03:00 DEBU             Started forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023/06/19 10:13:05 http: TLS handshake error from 127.0.0.1:57296: remote error: tls: bad certificate
2023-06-19T10:13:05-03:00 DEBU             Stopped forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:363
2023-06-19T10:13:05-03:00 DEBU             Started forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:357
2023/06/19 10:13:05 http: TLS handshake error from 127.0.0.1:57298: remote error: tls: bad certificate
2023-06-19T10:13:05-03:00 DEBU             Stopped forwarding request for "ssmmessages.us-east-1.amazonaws.com:443". alpnproxy/forward_proxy.go:363
...

edit: add extra logs

TeleLos commented 1 year ago

Try adding --endpoint-url to your tsh command: tsh aws ecs execute-command --endpoint. These commands require the use of endpoint URL mode.

GlauberrBatista commented 1 year ago

@TeleLos thank you for those new instructions. However, it's still not working as intended.

What I did:

$ tsh proxy aws --endpoint-url
Started AWS proxy which serves as an AWS endpoint URL at https://127.0.0.1:57090.
To avoid port randomization, you can choose the listening port using the --port flag.

In addition to the endpoint URL, use the following credentials to connect to the proxy:
  export AWS_ACCESS_KEY_ID=<generated_key>
  export AWS_SECRET_ACCESS_KEY=<generated_secret>
  export AWS_CA_BUNDLE=<path_to_pem>/aws-localca.pem

then on another window:

$ tsh aws ecs execute-command --cluster <cluster_name> --endpoint-url https://127.0.0.1:57090 --task <task_id> --interactive --command "/bin/bash"

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.

An error occurred (403) when calling the ExecuteCommand operation:
ERROR: exit status 254

I was able to make it work the following way (using the same proxy I created earlier):

$ AWS_ACCESS_KEY_ID=<generated_key> AWS_SECRET_ACCESS_KEY=<generated_secret> AWS_CA_BUNDLE=<path_to_pem>/aws-localca.pem aws ecs execute-command --cluster <cluster_name> --endpoint-url https://127.0.0.1:57090 --task <task_id> --interactive --command "/bin/bash"

It seems not to be the way it's supposed to work since I'm using the AWS CLI directly. Am I missing something?

edit: I'm using Teleport 13 now.

Teleport v13.0.3 git:v13.0.3-0-ge5db71f go1.20.4
Proxy version: 13.4.1
greedy52 commented 1 year ago

Hi @GlauberrBatista.

Sorry for the confusion. Could you try these out?

Alternative 1:

$ export NO_PROXY=ssmmessages.us-east-1.amazonaws.com
$ tsh aws ecs execute-command --cluster <cluster_name> --task $(tsh aws ecs list-tasks --cluster <cluster_name> --service-name <service_name> --output text --query "taskArns[0]") --interactive --command "/bin/bash"

Alternative 2:

$ tsh aws --endpoint-url -- ecs execute-command --cluster <cluster_name> --task $(tsh aws ecs list-tasks --cluster <cluster_name> --service-name <service_name> --output text --query "taskArns[0]") --interactive --command "/bin/bash"
GlauberrBatista commented 1 year ago

Hi @greedy52,

Thank you for the response. It works both ways. Thank you very much! Is that documented elsewhere? I didn't check the documentation recently, but I don't recall seeing that anywhere.

I'm closing this issue now. 🙌

greedy52 commented 1 year ago

@GlauberrBatista

Thank you for the response. It works both ways. Thank you very much!

Awesome. For the first one (NO_PROXY) you probably can just leave it in your bashrc and forget about it. But you would need more entries if pod is running in other regions.

I will make a fix to automatically do --endpoinit-url when tsh detects it's running tsh ecs execute-command so you won't need any of these in the future. we did something similar for tsh aws ssm start-session but missed this one.

I will update the document as well.

Thanks so much for reporting and trying things out. I will update this ticket when the fix is ready.