telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.55k stars 515 forks source link

Unable to connect from swapped container to AWS endpoints #1374

Closed ptemmer closed 1 month ago

ptemmer commented 4 years ago

As per @ark3 instructions on Slack, I'm opening a ticket to further investigate the following issue:

When swapping a deployment using docker-run, I'm unable to successfully pass the TLS handshake phase when trying curl -v https://route53.amazonaws.com (or other subdomains). Several other HTTPS domains that I've tried work fine (i.e google.com). Strange thing is that in the original deployment (using the same container image) it works as expected. Connectivity is not the issue, as I'm able to netcat the hostname on 443.

telepresence.log with further info: https://gist.github.com/ptemmer/87ccdf8dd71fe282f676977322cdc2e4

telepresence --version
0.104

Kubernetes cluster: Azure AKS 1.14

kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-10T21:53:58Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"126eb499523e5fffc0138e8e2e031787e5ab1943", GitTreeState:"clean", BuildDate:"2020-04-03T17:45:13Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

This issue has been reproduced on the machines of two other colleagues, both using Telepresence 0.105 running on Ubuntu.

ark3 commented 4 years ago

Notes from Slack:

Lukasz Pakula Jul 3rd at 4:35 AM

Hi, could someone from the telepresence team contribute to my issue? I created a sandbox environment to replicate the issue working as expected

telepresence --run sh -c "for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do curl https://telepresence-test.eu.auth0.com/api/v2/; done"

failing

telepresence --docker-run --rm -it pstauffer/curl sh -c "for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do curl https://telepresence-test.eu.auth0.com/api/v2/; done"

It's failing randomly with curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection

Lukasz Pakula 4 days ago

i don't experience this issue for all URLs, github API for example is ok

GitHub allows TLS 1.3 and 1.2, and disallows older TLS.

This particular auth0 endpoint disallows TLS 1.3, but allows the older versions.

The request that fails first hangs for 2 minutes. After that hang, something times out and sshuttle gets an EPIPE, which causes it to drop the connection. That yields the error 35 from curl.

The hang occurs between the proxy pod and the remote endpoint. I can't reproduce this issue myself, and Lukasz did not have a chance to try curl directly from the proxy pod (to rule out some weird interaction with sshuttle).

Lukasz worked around this by moving the cluster into a different zone (London instead of Ireland?). Other auth0 endpoints (e.g., ...us.auth0.com) also did not exhibit this problem.

Tyson Holub 1 month ago

I'm experiencing the same issue with https://developer.intuit.com which also resolves amazonaws. The issue only occurs when using telepresence into minikube. It does not occur using telepresence into GKE. It does not occur using exec into the docker image or the minikube pod

Tyson is using a minikube cluster. The intuit endpoint does not have TLS 1.3 enabled either.

I'm not (yet) sure that the TLS thing is relevant. Clearly those endpoints work outside of Telepresence, and I'm not sure how Telepresence could break them.

ark3 commented 4 years ago

Pieter was able to try a failing curl directly from the proxy pod and it worked. So it seems the issue is between the local container and the proxy pod, somehow.

pietert 4 minutes ago

It works from the proxy container

pietert 3 minutes ago

/usr/src/app # curl -v https://route53.amazonaws.com
* Rebuilt URL to: https://route53.amazonaws.com/
*   Trying 54.239.31.67...
* TCP_NODELAY set
* Connected to route53.amazonaws.com (54.239.31.67) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-SHA
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: CN=route53.amazonaws.com
*  start date: Apr  9 00:00:00 2020 GMT
*  expire date: Mar 15 12:00:00 2021 GMT
*  subjectAltName: host "route53.amazonaws.com" matched cert's "route53.amazonaws.com"
*  issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*  SSL certificate verify ok.
> GET / HTTP/1.1
ark3 commented 4 years ago

Forcing curl to use TLS 1.2 (and thus avoid an attempt of TLS 1.3's new zero-RTT handshake on a host that doesn't support TLS 1.3) does not fix this. The connection attempt still hangs and then fails when trying from the local container.

*   Trying 54.239.31.187...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x563ec7852f50)
* Connected to route53.amazonaws.com (54.239.31.187) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to route53.amazonaws.com:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to route53.amazonaws.com:443

This trace is also from pietert on Slack (@ptemmer)

ark3 commented 4 years ago

Yikes. This is #1220 or a variant thereof. I've been out of the loop. More on this tomorrow.

wasd171 commented 3 years ago
❯ telepresence --version
0.109

Kubernetes cluster: Azure AKS 1.17.16

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.16", GitCommit:"d88fadbd65c5e8bde22630d251766a634c7613b0", GitTreeState:"clean", BuildDate:"2020-12-18T15:59:27Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

I am getting the same issue, but with Postmark

  1. [OK] - Connecting from the dev container to https://postmarkapp.com

    ~ # curl https://postmarkapp.com -v
    *   Trying 159.203.80.76:443...
    * Connected to postmarkapp.com (159.203.80.76) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
    CApath: none
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
    * TLSv1.3 (IN), TLS handshake, Certificate (11):
    * TLSv1.3 (IN), TLS handshake, CERT verify (15):
    * TLSv1.3 (IN), TLS handshake, Finished (20):
    * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
    * TLSv1.3 (OUT), TLS handshake, Finished (20):
    * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
    * ALPN, server accepted to use http/1.1
    * Server certificate:
    *  subject: C=US; ST=Pennsylvania; L=Philadelphia; O=Wildbit LLC; CN=*.postmarkapp.com
    *  start date: Nov 18 00:00:00 2020 GMT
    *  expire date: Dec 19 23:59:59 2021 GMT
    *  subjectAltName: host "postmarkapp.com" matched cert's "postmarkapp.com"
    *  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
    *  SSL certificate verify ok.
    > GET / HTTP/1.1
    > Host: postmarkapp.com
    > User-Agent: curl/7.69.1
    > Accept: */*
    > 
    * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
    * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
    * old SSL session ID is stale, removing
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Date: Mon, 08 Feb 2021 14:48:22 GMT
    < Server: Apache
    < Content-Security-Policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' ws://localhost:3000 *.postmarkapp.com fonts.googleapis.com app.vwo.com cdnjs.cloudflare.com *.cloudfront.net wildbit.sinter-collect.com createsend.com wildbit.createsend.com js.createsend1.com api.craftcms.com *.createsend.com *.typekit.net fast.fonts.net fast.fonts.com *.helpscout.net *.googletagmanager.com *.googleadservices.com  *.google-analytics.com *.google.com *.visualwebsiteoptimizer.com *.simplecast.com *.twitter.com *.ads-twitter.com t.co *.facebook.net *.hs-analytics.net *.hs-banner.com *.fullstory.com feed-proxy.craftcms.com *.gstatic.com *.getsitecontrol.com *.helpscoutdocs.com *.github.io *.twimg.com *.vimeo.com *.youtube.com api.usemessages.com tag.rightmessage.com js.hs-scripts.com *.wistia.com *.wistia.net *.akamaihd.net src.litix.io wss://*.pusher.com data: blob: https://api.keen.io https://*.rightmessage.com; img-src * data: blob:; frame-ancestors 'self' http://app.vwo.com https://*.rightmessage.com https://*.postmarkapp.com;
    < Cache-Control: max-age=0
    < Expires: Mon, 08 Feb 2021 14:48:22 GMT
    < Vary: Accept-Encoding
    < X-Content-Type-Options: nosniff
    < X-Frame-Options: sameorigin
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8
    < 
    ...
  2. [ERROR] - Connecting from the dev container to https://api.postmarkapp.com

    ~ # curl https://api.postmarkapp.com -v
    *   Trying 3.137.63.180:443...
    * Connected to api.postmarkapp.com (3.137.63.180) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    * successfully set certificate verify locations:
    *   CAfile: /etc/ssl/certs/ca-certificates.crt
    CApath: none
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    * OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.postmarkapp.com:443 
    * Closing connection 0
    curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.postmarkapp.com:443 
  3. [ERROR] - Connecting from the dev container to http://api.postmarkapp.com

    ~ # curl http://api.postmarkapp.com -v
    *   Trying 3.137.63.180:80...
    * Connected to api.postmarkapp.com (3.137.63.180) port 80 (#0)
    > GET / HTTP/1.1
    > Host: api.postmarkapp.com
    > User-Agent: curl/7.69.1
    > Accept: */*
    > 
    * Empty reply from server
    * Connection #0 to host api.postmarkapp.com left intact
    curl: (52) Empty reply from server

Everything works when the pod gets deployed to the cluster, but somehow the packages misbehave when being sent through telepresence proxy :(

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 7 days with no activity.