gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.41k stars 1.74k forks source link

ssh service, combined with proxy, suffers from auth service restart #15837

Open dglushenok opened 2 years ago

dglushenok commented 2 years ago

Expected behavior: ssh service reconnects to restarted auth service quickly

Current behavior: almost 12 minutes between node entering degraded state and being recovered

Bug details:

We have following setup:

After restarting auth services (in case of updating configuration), proxy service starts to report non-200 codes for /readyz handle:

So, proxy becames unavailable for users for almost 12 minutes.

If I turn off ssh service at proxy - no problem observed, it quickly recovers from auth service restart. On ssh-only nodes no problem observed. It is somehow related to proxy role.

Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROC:1]    Sync rotation state cycle failed. Retrying in ~10s pid:392446.1 service/connect.go:678
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN             Inventory control stream failed: inventory control stream closed: error reading from server: EOF inventory/inventory.go:145
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 DEBU             Re-attempt control stream acquisition in ~2.410824443s. inventory/inventory.go:124
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [NODE:1:CA] Re-init the cache on error: watcher is closed: error reading from server: EOF. cache/cache.go:775
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 DEBU [NODE:1:CA] Reloading cache. cache/cache.go:779
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROXY:1:C] Re-init the cache on error: watcher is closed: error reading from server: EOF. cache/cache.go:775
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 DEBU [PROXY:1:C] Reloading cache. cache/cache.go:779
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 DEBU [PROC:1]    Broadcasting event. event:TeleportDegraded pid:392446.1 service/supervisor.go:383
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 INFO [PROC:1]    Detected Teleport component "node" is running in a degraded state. pid:392446.1 service/state.go:105
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROC:1]    Connection problem: sync rotation state: Get "https://teleport.cluster.local/v2/authorities/host/NPE?load_keys=false": dial tcp 172.23.8.70:3025: connect: connection refused. pid:392446.1 service/connect.go:779
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROC:1]    Sync rotation state cycle failed. Retrying in ~10s pid:392446.1 service/connect.go:678
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROC:1]    Connection problem: sync rotation state: Get "https://teleport.cluster.local/v2/authorities/host/NPE?load_keys=false": dial tcp 172.23.8.70:3025: connect: connection refused. pid:392446.1 service/connect.go:779
Aug 25 10:35:44 proxy-hostname teleport[392446]: 2022-08-25T10:35:44+03:00 WARN [PROC:1]    Sync rotation state cycle failed. Retrying in ~10s pid:392446.1 service/connect.go:678
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [WEB]       Callback start. auth:saml web/saml.go:92
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY]     Attempting to restart watch after waiting 2.84735411s. pid:392446.1 resource-kind:node services/watcher.go:206
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 WARN [PROXY]     Restart watch on error: watcher is closed: error reading from server: EOF. pid:392446.1 resource-kind:node services/watcher.go:213
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY]     Starting watch. pid:392446.1 resource-kind:node services/watcher.go:177
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [WEB]       Redirecting to web browser. auth:saml web/saml.go:126
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Attempting to restart watch after waiting 2.914732914s. pid:392446.1 resource-kind:proxy services/watcher.go:206
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 WARN [PROXY:1]   Restart watch on error: watcher is closed: error reading from server: EOF. pid:392446.1 resource-kind:proxy services/watcher.go:213
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Starting watch. pid:392446.1 resource-kind:proxy services/watcher.go:177
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   List of known proxies updated: ["3631ac3e-d9cd-4e4c-a400-5e3ae1bdf6cc" "8638a136-4dbb-403d-8d3c-51ac3c0eda9f" "96e1edc7-6867-425e-9ebf-70647d2cc37c" "e066751a-a312-44b3-baa3-a1942e724548"]. pid:392446.1 services/watcher.go:410
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [DISCOVERY] Sending node discovery request with proxies ["3631ac3e-d9cd-4e4c-a400-5e3ae1bdf6cc" "8638a136-4dbb-403d-8d3c-51ac3c0eda9f" "96e1edc7-6867-425e-9ebf-70647d2cc37c" "e066751a-a312-44b3-baa3-a1942e724548"] to 172.23.6.92:53416. reversetunnel/conn.go:239
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [DISCOVERY] Sending node discovery request with proxies ["3631ac3e-d9cd-4e4c-a400-5e3ae1bdf6cc" "8638a136-4dbb-403d-8d3c-51ac3c0eda9f" "96e1edc7-6867-425e-9ebf-70647d2cc37c" "e066751a-a312-44b3-baa3-a1942e724548"] to 172.23.7.92:55926. reversetunnel/conn.go:239
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [DISCOVERY] Sending app discovery request with proxies ["3631ac3e-d9cd-4e4c-a400-5e3ae1bdf6cc" "8638a136-4dbb-403d-8d3c-51ac3c0eda9f" "96e1edc7-6867-425e-9ebf-70647d2cc37c" "e066751a-a312-44b3-baa3-a1942e724548"] to 172.23.7.92:55920. reversetunnel/conn.go:239
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [DISCOVERY] Sending app discovery request with proxies ["3631ac3e-d9cd-4e4c-a400-5e3ae1bdf6cc" "8638a136-4dbb-403d-8d3c-51ac3c0eda9f" "96e1edc7-6867-425e-9ebf-70647d2cc37c" "e066751a-a312-44b3-baa3-a1942e724548"] to 172.23.6.92:53402. reversetunnel/conn.go:239
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [AUTH]      ClientCertPool -> cert(NPE issued by NPE:150697127953305817417387513129116094324) auth/middleware.go:680
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [AUTH]      ClientCertPool -> cert(NPE issued by NPE:332142151485100923591418454858366357672) auth/middleware.go:680
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [SSH:PROXY] Incoming connection 172.23.7.92:56214 -> 172.23.7.91:3024 version: SSH-2.0-Go, certtype: "host" pid:392446.1 sshutils/server.go:465
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Transport request: teleport-transport. pid:392446.1 reversetunnel/srv.go:635
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Received out-of-band proxy transport request for @remote-auth-server []. pid:392446.1 reversetunnel/transport.go:206
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Attempting to dial through tunnel with server ID "". pid:392446.1 reversetunnel/transport.go:377
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Attempting to dial directly "tp-auth.example.com:3025". pid:392446.1 reversetunnel/transport.go:394
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU             Skipping login no-login-b1ec2fa4-04fb-48d0-b474-3c4cc7f8b079, not a valid Unix login. services/role.go:260
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Returning direct dialed connection to "tp-auth.example.com:3025". pid:392446.1 reversetunnel/transport.go:400
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [PROXY:1]   Successfully dialed to @remote-auth-server "", start proxying. pid:392446.1 reversetunnel/transport.go:313
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [SSH:PROXY] Incoming connection 172.23.7.92:56210 -> 172.23.7.91:3024 version: SSH-2.0-Go, certtype: "host" pid:392446.1 sshutils/server.go:465
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:56210. pid:392446.1 sshutils/server.go:470
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [SSH:PROXY] Incoming connection 172.23.7.92:56216 -> 172.23.7.91:3024 version: SSH-2.0-Go, certtype: "host" pid:392446.1 sshutils/server.go:465
Aug 25 10:35:47 proxy-hostname teleport[392446]: 2022-08-25T10:35:47+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:56216. pid:392446.1 sshutils/server.go:470
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU [NODE:1]    Attempting to restart watch after waiting 3.570612947s. pid:392446.1 resource-kind:lock services/watcher.go:206
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 WARN [NODE:1]    Restart watch on error: watcher is closed: error reading from server: EOF. pid:392446.1 resource-kind:lock services/watcher.go:213
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU [NODE:1]    Starting watch. pid:392446.1 resource-kind:lock services/watcher.go:177
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU             Skipping login no-login-b1ec2fa4-04fb-48d0-b474-3c4cc7f8b079, not a valid Unix login. services/role.go:260
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU             Skipping login no-login-b1ec2fa4-04fb-48d0-b474-3c4cc7f8b079, not a valid Unix login. services/role.go:260
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU [SSH:PROXY] Incoming connection 172.23.7.92:56218 -> 172.23.7.91:3024 version: SSH-2.0-Go, certtype: "host" pid:392446.1 sshutils/server.go:465
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:56218. pid:392446.1 sshutils/server.go:470
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU             Skipping login no-login-b1ec2fa4-04fb-48d0-b474-3c4cc7f8b079, not a valid Unix login. services/role.go:260
Aug 25 10:35:48 proxy-hostname teleport[392446]: 2022-08-25T10:35:48+03:00 DEBU             Skipping login no-login-b1ec2fa4-04fb-48d0-b474-3c4cc7f8b079, not a valid Unix login. services/role.go:260
Aug 25 10:35:49 proxy-hostname teleport[392446]: 2022-08-25T10:35:49+03:00 DEBU [PROXY]     Attempting to restart watch after waiting 4.203926241s. pid:392446.1 resource-kind:cert_authority services/watcher.go:206
Aug 25 10:35:49 proxy-hostname teleport[392446]: 2022-08-25T10:35:49+03:00 WARN [PROXY]     Restart watch on error: watcher is closed: error reading from server: EOF. pid:392446.1 resource-kind:cert_authority services/watcher.go:213
Aug 25 10:35:49 proxy-hostname teleport[392446]: 2022-08-25T10:35:49+03:00 DEBU [PROXY]     Starting watch. pid:392446.1 resource-kind:cert_authority services/watcher.go:177
Aug 25 10:35:49 proxy-hostname teleport[392446]: 2022-08-25T10:35:49+03:00 DEBU [PROXY:1:C] Initiating new watch after waiting 4.41276821s. cache/cache.go:790
Aug 25 10:35:49 proxy-hostname teleport[392446]: 2022-08-25T10:35:49+03:00 DEBU [NODE:1:CA] Initiating new watch after waiting 5.108639005s. cache/cache.go:790
Aug 25 10:35:50 proxy-hostname teleport[392446]: 2022-08-25T10:35:50+03:00 DEBU [PROXY]     Attempting to restart watch after waiting 5.336694846s. pid:392446.1 resource-kind:lock services/watcher.go:206
Aug 25 10:35:50 proxy-hostname teleport[392446]: 2022-08-25T10:35:50+03:00 WARN [PROXY]     Restart watch on error: watcher is closed: error reading from server: EOF. pid:392446.1 resource-kind:lock services/watcher.go:213
Aug 25 10:35:50 proxy-hostname teleport[392446]: 2022-08-25T10:35:50+03:00 DEBU [PROXY]     Starting watch. pid:392446.1 resource-kind:lock services/watcher.go:177
Aug 25 10:38:13 proxy-hostname teleport[392446]: 2022-08-25T10:38:13+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:55906. pid:392446.1 sshutils/server.go:470
Aug 25 10:38:17 proxy-hostname teleport[392446]: 2022-08-25T10:38:17+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:55912. pid:392446.1 sshutils/server.go:470
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:55926. pid:392446.1 sshutils/server.go:470
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:55920. pid:392446.1 sshutils/server.go:470
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [PROXY:SER] Cluster agent disconnected. trace.fields:map[cluster:NPE] reversetunnel/localsite.go:519
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [DISCOVERY] Disconnecting connection to NPE 172.23.7.92:55920: agent disconnected. reversetunnel/conn.go:163
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [PROXY:SER] Cluster connection closed. trace.fields:map[cluster:NPE] reversetunnel/localsite.go:496
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [PROXY:SER] Cluster agent disconnected. trace.fields:map[cluster:NPE] reversetunnel/localsite.go:519
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [DISCOVERY] Disconnecting connection to NPE 172.23.7.92:55926: agent disconnected. reversetunnel/conn.go:163
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [PROXY:SER] Cluster connection closed. trace.fields:map[cluster:NPE] reversetunnel/localsite.go:496
Aug 25 10:38:18 proxy-hostname teleport[392446]: 2022-08-25T10:38:18+03:00 DEBU [SSH:PROXY] Closed connection 172.23.7.92:55918. pid:392446.1 sshutils/server.go:470
Aug 25 10:39:26 proxy-hostname teleport[392446]: 2022-08-25T10:39:26+03:00 DEBU             'teleport-service' group not found, not deleting users srv/usermgmt.go:292
Aug 25 10:41:31 proxy-hostname teleport[392446]: 2022-08-25T10:41:31+03:00 INFO [PROC:1]    Teleport component "node" is recovering from a degraded state. pid:392446.1 service/state.go:119
Aug 25 10:44:26 proxy-hostname teleport[392446]: 2022-08-25T10:44:26+03:00 DEBU             'teleport-service' group not found, not deleting users srv/usermgmt.go:292
Aug 25 10:47:24 proxy-hostname teleport[392446]: 2022-08-25T10:47:24+03:00 INFO [PROC:1]    Teleport component "node" has recovered from a degraded state. pid:392446.1 service/state.go:123

Coonfigs:

espadolini commented 2 years ago

Partial dupe of #13122 (same underlying cause)