gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.2k stars 1.73k forks source link

The return of the /var/lib/teleport problem. | *trace.ConnectionProblemError https://teleport.cluster.local/v2/domain remote error: tls: internal error #8512

Closed benarent closed 2 months ago

benarent commented 2 years ago

Description

What happened:

I was building a large cluster for Teleport 8.0.0.alpha ( although I've seen similar issues with Teleport 7.1.x / 7.2.x )

I've seen customer issues similar https://github.com/gravitational/teleport/issues/6308# https://github.com/gravitational/teleport/issues/5355 and the OG issue https://github.com/gravitational/teleport/issues/2838 and a few other in community Slack. This issue has become worse for nodes recently.

The issue is. A node has Teleport with SystemD but it's started before it has a valid config. Once it's provided a valid config and auth token, it'll keep trying to join the local cluster vs using the new config.

Starting with valid token, on a node that has already 'started' with Teleport ubuntu@ip-10-0-0-126:~$ sudo teleport start --roles=node --token=xxxx --auth-server=teleport-8.asteroid.earth:443 -d 2021-10-07T04:55:04Z DEBU [SQLITE] Connected to: file:/var/lib/teleport/proc/sqlite.db?_busy_timeout=10000&_sync=OFF, poll stream period: 1s lite/lite.go:172 2021-10-07T04:55:04Z DEBU [SQLITE] Synchronous: 0, busy timeout: 10000 lite/lite.go:217 2021-10-07T04:55:04Z DEBU [KEYGEN] SSH cert authority started with no keys pre-compute. native/native.go:106 2021-10-07T04:55:04Z DEBU [PROC:1] Adding service to supervisor. service:register.node service/supervisor.go:201 2021-10-07T04:55:04Z DEBU [PROC:1] Adding service to supervisor. service:ssh.node service/supervisor.go:201 2021-10-07T04:55:04Z DEBU [PROC:1] Adding service to supervisor. service:ssh.shutdown service/supervisor.go:201 2021-10-07T04:55:04Z DEBU [PROC:1] Adding service to supervisor. service:common.rotate service/supervisor.go:201 2021-10-07T04:55:04Z DEBU [PROC:1] No signal pipe to import, must be first Teleport process. service/service.go:845 2021-10-07T04:55:04Z DEBU [PROC:1] Service has started. service:register.node service/supervisor.go:262 2021-10-07T04:55:04Z DEBU [PROC:1] Connected state: never updated. service/connect.go:103 2021-10-07T04:55:04Z INFO [PROC:1] Connecting to the cluster ip-10-0-0-21 with TLS client certificate. service/connect.go:132 2021-10-07T04:55:04Z DEBU [PROC:1] Attempting to connect to Auth Server directly. auth-addrs:[teleport-8.asteroid.earth:443] service/connect.go:808 2021-10-07T04:55:04Z DEBU [PROC:1] Service has started. service:ssh.node service/supervisor.go:262 2021-10-07T04:55:04Z DEBU [PROC:1] Service has started. service:ssh.shutdown service/supervisor.go:262 2021-10-07T04:55:04Z DEBU [PROC:1] Service has started. service:common.rotate service/supervisor.go:262 2021-10-07T04:55:04Z DEBU [PROC:1] Failed to connect to Auth Server directly. auth-addrs:[teleport-8.asteroid.earth:443] service/connect.go:814 2021-10-07T04:55:04Z DEBU [PROC:1] Attempting to discover reverse tunnel address. auth-addrs:[teleport-8.asteroid.earth:443] service/connect.go:823 2021-10-07T04:55:04Z DEBU Attempting GET teleport-8.asteroid.earth:443/webapi/find webclient/webclient.go:62 2021-10-07T04:55:04Z DEBU [PROC:1] Attempting to connect to Auth Server through tunnel. proxy-addr:teleport-8.asteroid.earth:443 service/connect.go:833 2021-10-07T04:55:04Z DEBU Attempting GET teleport-8.asteroid.earth:443/webapi/find webclient/webclient.go:62 2021-10-07T04:55:04Z DEBU Attempting GET teleport-8.asteroid.earth:443/webapi/find webclient/webclient.go:62 2021-10-07T04:55:04Z DEBU [HTTP:PROX] No valid environment variables found. proxy/proxy.go:314 2021-10-07T04:55:04Z DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:228 2021-10-07T04:55:04Z DEBU [HTTP:PROX] No valid environment variables found. proxy/proxy.go:314 2021-10-07T04:55:04Z DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:228 2021-10-07T04:55:04Z DEBU No CA for host teleport-8.asteroid.earth:443. sshutils/callback.go:83 2021-10-07T04:55:04Z WARN [PROC:1] Failed to close Auth Server tunnel client. error:[] service/connect.go:885 2021-10-07T04:55:04Z DEBU [PROC:1] Failed to connect to Auth Server directly. auth-addrs:[teleport-8.asteroid.earth:443] error:[ ERROR REPORT: Original Error: *trace.ConnectionProblemError Get "https://teleport.cluster.local/v2/domain": remote error: tls: internal error Stack Trace: /go/src/github.com/gravitational/teleport/lib/httplib/httplib.go:133 github.com/gravitational/teleport/lib/httplib.ConvertResponse /go/src/github.com/gravitational/teleport/lib/auth/clt.go:280 github.com/gravitational/teleport/lib/auth.(*Client).Get /go/src/github.com/gravitational/teleport/lib/auth/clt.go:371 github.com/gravitational/teleport/lib/auth.(*Client).GetDomainName /go/src/github.com/gravitational/teleport/lib/auth/clt.go:1515 github.com/gravitational/teleport/lib/auth.(*Client).GetLocalClusterName /go/src/github.com/gravitational/teleport/lib/service/connect.go:909 github.com/gravitational/teleport/lib/service.(*TeleportProcess).newClientDirect /go/src/github.com/gravitational/teleport/lib/service/connect.go:809 github.com/gravitational/teleport/lib/service.(*TeleportProcess).newClient /go/src/github.com/gravitational/teleport/lib/service/connect.go:133 github.com/gravitational/teleport/lib/service.(*TeleportProcess).connect /go/src/github.com/gravitational/teleport/lib/service/connect.go:83 github.com/gravitational/teleport/lib/service.(*TeleportProcess).connectToAuthService /go/src/github.com/gravitational/teleport/lib/service/connect.go:51 github.com/gravitational/teleport/lib/service.(*TeleportProcess).reconnectToAuthService /go/src/github.com/gravitational/teleport/lib/service/service.go:1975 github.com/gravitational/teleport/lib/service.(*TeleportProcess).registerWithAuthServer.func1 /go/src/github.com/gravitational/teleport/lib/service/supervisor.go:494 github.com/gravitational/teleport/lib/service.(*LocalService).Serve /go/src/github.com/gravitational/teleport/lib/service/supervisor.go:263 github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve.func1 /opt/go/src/runtime/asm_amd64.s:1371 runtime.goexit ```

Deleting var/lib/teleport Note: since it started without any config, it started as an Auth + Node + Proxy. ( that's why var/lib teleport has web proxy.

ubuntu@ip-10-0-0-126:~$ rm -rf /var/lib/teleport/
rm: cannot remove '/var/lib/teleport/host_uuid': Permission denied
rm: cannot remove '/var/lib/teleport/backend': Permission denied
rm: cannot remove '/var/lib/teleport/webproxy_key.pem': Permission denied
rm: cannot remove '/var/lib/teleport/proc': Permission denied
rm: cannot remove '/var/lib/teleport/webproxy_cert.pem': Permission denied
rm: cannot remove '/var/lib/teleport/log': Permission denied
ubuntu@ip-10-0-0-126:~$ sudo rm -rf /var/lib/teleport/
Starting 'fresh' and things work. ubuntu@ip-10-0-0-126:~$ sudo teleport start --roles=node --token=xxx --auth-server=teleport-8.asteroid.earth:443 -d 2021-10-07T04:55:29Z INFO Generating new host UUID: 6b6acc4d-d9d1-4571-963d-f8cf10ba1aa2. service/service.go:638 2021-10-07T04:55:29Z DEBU [SQLITE] Connected to: file:/var/lib/teleport/proc/sqlite.db?_busy_timeout=10000&_sync=OFF, poll stream period: 1s lite/lite.go:172 2021-10-07T04:55:29Z DEBU [SQLITE] Synchronous: 0, busy timeout: 10000 lite/lite.go:217 2021-10-07T04:55:29Z DEBU [KEYGEN] SSH cert authority started with no keys pre-compute. native/native.go:106 2021-10-07T04:55:29Z DEBU [PROC:1] Adding service to supervisor. service:register.node service/supervisor.go:201 2021-10-07T04:55:29Z DEBU [PROC:1] Adding service to supervisor. service:ssh.node service/supervisor.go:201 2021-10-07T04:55:29Z DEBU [PROC:1] Adding service to supervisor. service:ssh.shutdown service/supervisor.go:201 2021-10-07T04:55:29Z DEBU [PROC:1] Adding service to supervisor. service:common.rotate service/supervisor.go:201 2021-10-07T04:55:29Z DEBU [PROC:1] No signal pipe to import, must be first Teleport process. service/service.go:845 2021-10-07T04:55:29Z DEBU [PROC:1] Service has started. service:register.node service/supervisor.go:262 2021-10-07T04:55:29Z INFO [PROC:1] Joining the cluster with a secure token. service/connect.go:353 2021-10-07T04:55:29Z DEBU [PROC:1] Generating new key pair for Node first-time-connect. service/connect.go:261 2021-10-07T04:55:29Z DEBU [PROC:1] Service has started. service:ssh.node service/supervisor.go:262 2021-10-07T04:55:29Z DEBU [PROC:1] Service has started. service:ssh.shutdown service/supervisor.go:262 2021-10-07T04:55:29Z DEBU [PROC:1] Service has started. service:common.rotate service/supervisor.go:262 2021-10-07T04:55:29Z DEBU [AUTH] Registering node to the cluster. auth-servers:[{teleport-8.asteroid.earth:443 tcp }] auth/register.go:136 2021-10-07T04:55:29Z DEBU [AUTH] The first specified auth server appears to be a proxy. auth/register.go:150 2021-10-07T04:55:29Z INFO [AUTH] Attempting registration via proxy server. auth/register.go:156 2021-10-07T04:55:29Z DEBU [CLIENT] HTTPS client init(proxyAddr=teleport-8.asteroid.earth:443, insecure=false) client/weblogin.go:221 2021-10-07T04:55:29Z DEBU [CLIENT] Attempting https://teleport-8.asteroid.earth:443/v1/webapi/host/credentials client/https_client.go:85 2021-10-07T04:55:29Z INFO [AUTH] Successfully registered via proxy server. auth/register.go:163 ```

What you expected to happen:

If Teleport is started without any config or tokens, a valid config can try to override it / we follow #2838 proposal also, a better error message. It looks like it's a TLS issue vs a joining issue.

benarent commented 2 years ago

I found the root issue for my system. I was starting Teleport with our default systemd unit, ( without --config ) this would start Teleport in proxy-auth-node mode, and even once I update it. It would used the cache credentials

zmb3 commented 2 months ago

Looks like this particular issue was user error, and we already have #2838 to track the larger problem.