gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.62k stars 1.76k forks source link

Teleport Connect App failing to connect when using TLS routing and TLS L7 Load Balancer #32455

Closed anthonysomerset closed 1 year ago

anthonysomerset commented 1 year ago

Expected behavior: Teleport Connect GUI app should be able to connect to a Teleport cluster that is using tls_routing and behind an L7 Load balancer terminating TLS

Current behavior: tsh and web browser works fine - Teleport Connect GUI presents a UNKNOWN: x509 certificate signed by unknown authority error

Bug details:

It should be noted that

Web GUI works CLI works fine Reverse Tunnel ssh working fine, including remote login via CLI or web interface DB connections were working but i need to recheck this as i did use the GUI app before - UPDATE - confirmed still working via CLI tsh db proxy command Only the Teleport Connect GUI app is failing to work

If i review the audit log on the auth server - i see that the the gui client does login fine, and does also show a cert.create command succeeding, but looks like the post login processes?

Flow is basically

Client > Azure App Gateway (HTTPS/TLS/443) > Internal L4 Load Balancer (TLS/443) > AKS Proxy Pod (TLS/3080)

Running the ALPN test mentioned in the FAQ on https://goteleport.com/docs/architecture/tls-routing/ correctly works:

$ curl -v https://teleport.domain.com/webapi/connectionupgrade -H "Connection: Upgrade" -H "Upgrade: alpn-ping" --no-alpn

*   Trying 40.x.x.90:443...
* Connected to teleport.domain.com (40.x.x.90) port 443 (#0)
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* Server certificate:
*  subject: CN=*.teleport.domain.com
*  start date: Aug 22 05:24:53 2023 GMT
*  expire date: Nov 20 05:24:52 2023 GMT
*  subjectAltName: host "teleport.domain.com" matched cert's "teleport.domain.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
> GET /webapi/connectionupgrade HTTP/1.1
> Host: teleport.domain.com
> User-Agent: curl/8.1.2
> Accept: */*
> Connection: Upgrade
> Upgrade: alpn-ping
> 
< HTTP/1.1 101 Switching Protocols
< Date: Sun, 24 Sep 2023 09:49:43 GMT
< Connection: upgrade
< Upgrade: alpn-ping
< X-Teleport-Upgrade: alpn-ping
< 
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
* Failure writing output to destination
* Closing connection 0

deployed using following helm values:

chartMode: azure
clusterName: teleport.domain.com

authentication.secondFactor: optional
tls:
  existingSecretName: teleport-tls
azure:
  databaseHost: "xxxxxxxx.postgres.database.azure.com"
  databaseUser: "teleport-mi"
  backendDatabase: "prd_teleport"
  auditLogDatabase: "prd_teleport"
  auditLogMirrorOnStdout: true
  sessionRecordingStorageAccount: "xxxxxxxxx.blob.core.windows.net"
  clientID: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  databasePoolMaxConnections: 20

proxyListenerMode: "multiplex"

proxy:
  teleportConfig:
    version: v3
    # put your teleport.yaml proxy configuration here
    teleport:
      # The join_params section must be provided for the proxies to join the auth servers
      # By default, the chart creates a Kubernetes join token which you can use.
      join_params:
        method: kubernetes
        # The token name pattern is "<RELEASE-NAME>-proxy"
        # Change this if you change the Helm release name.
        token_name: "teleport-proxy"
      # The auth server domain pattern is "<RELEASE-NAME>-auth.<RELEASE-NAMESPACE>.svc.cluster.local:3025"
      # If you change the Helm release name or namespace you must adapt the `auth_server` value.
      auth_server: "teleport-auth.teleport.svc.cluster.local:3025"
      log:
        output: stderr
        severity: INFO
    proxy_service:
      enabled: true
#      proxy_protocol: false # NOT WORKING?
      public_addr: teleport.domain.com:443
      web_listen_addr: 0.0.0.0:3080
      https_keypairs:
        - key_file: /tls/wildcard-teleport-domain-com.key
          cert_file: /tls/wildcard-teleport-domain-com.crt
      https_keypairs_reload_interval: 1h
      trust_x_forwarded_for: true

# If you are running Kubernetes 1.23 or above, disable PodSecurityPolicies
podSecurityPolicy:
  enabled: false

highAvailability:
  replicaCount: 2  
annotations:
  service:
    "external-dns.alpha.kubernetes.io/hostname": "teleport.domain.com*.teleport.domain.com"
    "service.beta.kubernetes.io/azure-load-balancer-internal": "true"
    "external-dns.alpha.kubernetes.io/ttl": "60"
extraVolumes:
- name: tls-secrets-store-inline
  csi:
    driver: secrets-store.csi.k8s.io
    readOnly: true
    volumeAttributes:
      secretProviderClass: "azure-tls-teleport"
extraVolumeMounts:
- name: tls-secrets-store-inline
  mountPath: "/tls"
  readOnly: true
anthonysomerset commented 1 year ago

heads up i did also try the insecure mode referenced at the bottom of https://goteleport.com/docs/connect-your-client/teleport-connect/ - it did not change the behaviour

my observation is that

1) Login and authentication the cluster is working just fine -

ravicious commented 1 year ago

I ask about this internally and one of our engineers noticed that the stack trace indicates that you might be running an older version of Connect.

Could you verify that you're indeed running v14.0.0? https://goteleport.com/docs/connect-your-client/teleport-connect/#submitting-an-issue

anthonysomerset commented 1 year ago

ok now i feel stupid... somehow i only had 12.0.2 installed and had not updated it.... this is working perfectly with v14 of the connect client

anthonysomerset commented 1 year ago

closing as no actual bug :)

ravicious commented 1 year ago

No worries! I often find myself running a wrong version because Spotlight opens some dev build from a few months ago instead of the version from /Applications.

Unlike tsh, Connect doesn't have any warnings about incompatible versions. We should add some.