Closed gdubicki closed 1 month ago
Some Teleport logs from that node (I was able to ssh to it differently):
Sep 05 11:46:52 node teleport[2462368]: 2024-09-05T11:46:52Z INFO Successfully synced "unit" upgrader maintenance window value. upgradewindow/upgradewindow.go:296
Sep 05 11:47:58 node teleport[2462368]: 2024-09-05T11:47:58Z WARN [FILE] Skipping upload 35a45f6a-9f29-4c28-8198-10463a91bfa4, missing subdirectory. filesessions/filestream.go:313
Sep 05 11:52:55 node teleport[2462368]: 2024-09-05T11:52:55Z WARN [FILE] Skipping upload 35a45f6a-9f29-4c28-8198-10463a91bfa4, missing subdirectory. filesessions/filestream.go:313
Sep 05 11:53:15 node teleport[2462368]: 2024-09-05T11:53:15Z INFO handling new resumable SSH connection resumption/server_exchange.go:92
Sep 05 11:53:15 node teleport[2462368]: 2024-09-05T11:53:15Z INFO handing resumable connection to the SSH server resumption/server_exchange.go:136
Sep 05 11:53:16 node teleport[2462368]: 2024-09-05T11:53:16Z WARN [SSH:NODE] "Dropping inbound ssh connection due to error: creating teleport system groups\n\tuser: lookup groupname teleport-keep: connection refused" sshutils/server.go:580
Sep 05 11:53:16 node teleport[2462368]: 2024-09-05T11:53:16Z INFO resumable connection completed resumption/server_exchange.go:138
Sep 05 11:55:21 node teleport[2462368]: 2024-09-05T11:55:21Z INFO handling new resumable SSH connection resumption/server_exchange.go:92
Sep 05 11:55:21 node teleport[2462368]: 2024-09-05T11:55:21Z INFO handing resumable connection to the SSH server resumption/server_exchange.go:136
Sep 05 11:55:22 node teleport[2462368]: 2024-09-05T11:55:22Z WARN [SSH:NODE] "Dropping inbound ssh connection due to error: creating teleport system groups\n\tuser: lookup groupname teleport-keep: connection refused" sshutils/server.go:580
Sep 05 11:55:23 node teleport[2462368]: 2024-09-05T11:55:23Z INFO resumable connection completed resumption/server_exchange.go:138
Sep 05 11:57:52 node teleport[2462368]: 2024-09-05T11:57:52Z WARN [FILE] Skipping upload 35a45f6a-9f29-4c28-8198-10463a91bfa4, missing subdirectory. filesessions/filestream.go:313
Sep 05 12:02:21 node teleport[2462368]: 2024-09-05T12:02:21Z WARN [FILE] Skipping upload 35a45f6a-9f29-4c28-8198-10463a91bfa4, missing subdirectory. filesessions/filestream.go:313
Sep 05 12:03:35 node teleport[2462368]: 2024-09-05T12:03:35Z WARN Access denied to instance labels, does the instance have compute.instances.get permission? gcp/imds.go:201
Sep 05 12:03:36 node teleport[2462368]: 2024-09-05T12:03:36Z WARN Access denied to resource management tags, does the instance have compute.instances.listEffectiveTags permission? gcp/imds.go:210
Sep 05 12:03:36 node teleport[2462368]: 2024-09-05T12:03:36Z WARN Access denied to instance labels, does the instance have compute.instances.get permission? gcp/imds.go:201
Sep 05 12:03:36 node teleport[2462368]: 2024-09-05T12:03:36Z WARN Access denied to resource management tags, does the instance have compute.instances.listEffectiveTags permission? gcp/imds.go:210
A restart of the Teleport service on the node did not help.
The Teleport config (/etc/teleport.yaml
) on the affected node:
version: v3
teleport:
nodename: <node-name>
data_dir: /var/lib/teleport
join_params:
token_name: /var/lib/teleport/tokens/join_token.file
method: token
proxy_server: <company>.teleport.sh:443
log:
output: stderr
severity: INFO
format:
output: text
ca_pin: ""
diag_addr: ""
auth_service:
enabled: "no"
ssh_service:
enabled: "yes"
labels:
cloud: gcp
pam:
enabled: "yes"
proxy_service:
enabled: "no"
https_keypairs: []
https_keypairs_reload_interval: 0s
acme: {}
Hi @gdubicki! :wave: Could you share some more details about your environment? The error message suggests there may be an external group database in the mix, is that true?
Sure, @eriktate! The node is Ubuntu 22.04 LTS but it doesn't have any external group database. Our nodes are managed with Chef.
This problem didn't arise on different nodes which have the same OS and a very similar setup.
As @programmerq has suggested (thanks!) in the support ticket that I created in parallel, after enabling DEBUG logging we can see this:
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z DEBU [NODE] Checking permissions for (gdubicki,gdubicki) to login to node with RBAC checks. srv/authhandlers.go:621
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z DEBU [SSH:NODE] Incoming connection <...>:60227 -> 10.138.0.13:59932 version: SSH-2.0-Go, certtype: "user" sshutils/server.go:553
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z DEBU "/usr/sbin/groupadd output: groupadd: group 'teleport-system' already exists\n" host/hostusers.go:55
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z DEBU "Error creating user gdubicki: creating teleport system groups\n\tuser: lookup groupname teleport-keep: connection refused" srv/sess.go:293
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z WARN [SSH:NODE] "Dropping inbound ssh connection due to error: creating teleport system groups\n\tuser: lookup groupname teleport-keep: connection refused" sshutils/server.go:580
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z INFO resumable connection completed resumption/server_exchange.go:138
Sep 09 15:15:18 node teleport[682248]: 2024-09-09T15:15:18Z DEBU handling new resumable connection error:[
Sep 09 15:15:18 node teleport[682248]: ERROR REPORT:
Sep 09 15:15:18 node teleport[682248]: Original Error: poll.errNetClosing use of closed network connection
Sep 09 15:15:18 node teleport[682248]: Stack Trace:
Sep 09 15:15:18 node teleport[682248]: github.com/gravitational/teleport/lib/resumption/resumable.go:395 github.com/gravitational/teleport/lib/resumption.runResumeV1Write
Sep 09 15:15:18 node teleport[682248]: github.com/gravitational/teleport/lib/resumption/resumable.go:169 github.com/gravitational/teleport/lib/resumption.runResumeV1Unlocking.func5
Sep 09 15:15:18 node teleport[682248]: golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 golang.org/x/sync/errgroup.(*Group).Go.func1
Sep 09 15:15:18 node teleport[682248]: runtime/asm_amd64.s:1695 runtime.goexit
Sep 09 15:15:18 node teleport[682248]: User Message: write loop
Sep 09 15:15:18 node teleport[682248]: connection closed
Sep 09 15:15:18 node teleport[682248]: use of closed network connection] resumption/server_exchange.go:144
When I check if the group exists, it in fact does:
root@node:~# getent group teleport-system
teleport-system:x:1043:
root@node:~# echo $?
0
Interesting 🤔 The group and user lookups during host user creation defer to getgrname_r
and getpwnam_r
, which should be the exact same functions supporting getent group teleport-system
. @gdubicki would you be able to share this host's /etc/nsswitch.conf
? The network connection errors are still confusing me as I would expect a different error if there was an issue reading from your local group and/or passwd database
Hey @eriktate! I am for now working on this with your support team, but I will share the results when we are done.
This also affects v15.4.18 recently added a new host, and ran into a simlar problem. Our error message is
ERROR REPORT:
Original Error: *ssh.OpenChannelError ssh: rejected: administratively prohibited (user: unknown user srelf)
Stack Trace:
github.com/gravitational/teleport/api@v0.0.0/observability/tracing/ssh/client.go:236 github.com/gravitational/teleport/api/observability/tracing/ssh.(*clientWrapper).NewSession
github.com/gravitational/teleport/api@v0.0.0/observability/tracing/ssh/client.go:200 github.com/gravitational/teleport/api/observability/tracing/ssh.(*Client).NewSession
github.com/gravitational/teleport/lib/client/session.go:219 github.com/gravitational/teleport/lib/client.(*NodeSession).createServerSession
github.com/gravitational/teleport/lib/client/session.go:301 github.com/gravitational/teleport/lib/client.(*NodeSession).interactiveSession
github.com/gravitational/teleport/lib/client/session.go:518 github.com/gravitational/teleport/lib/client.(*NodeSession).runShell
github.com/gravitational/teleport/lib/client/client.go:1592 github.com/gravitational/teleport/lib/client.(*NodeClient).RunInteractiveShell
github.com/gravitational/teleport/lib/client/api.go:1919 github.com/gravitational/teleport/lib/client.(*TeleportClient).runShellOrCommandOnSingleNode
github.com/gravitational/teleport/lib/client/api.go:1642 github.com/gravitational/teleport/lib/client.(*TeleportClient).SSH
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3481 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1.1
github.com/gravitational/teleport/lib/client/api.go:595 github.com/gravitational/teleport/lib/client.RetryWithRelogin
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3480 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3318 github.com/gravitational/teleport/tool/tsh/common.retryWithAccessRequest
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3479 github.com/gravitational/teleport/tool/tsh/common.onSSH
github.com/gravitational/teleport/tool/tsh/common/tsh.go:1325 github.com/gravitational/teleport/tool/tsh/common.Run
github.com/gravitational/teleport/tool/tsh/common/tsh.go:593 github.com/gravitational/teleport/tool/tsh/common.Main
github.com/gravitational/teleport/tool/tsh/main.go:26 main.main
runtime/proc.go:267 runtime.main
runtime/asm_arm64.s:1197 runtime.goexit
User Message: ssh: rejected: administratively prohibited (user: unknown user srelf)
Our main teleport server that runs auth etc, is running Teleport v15.4.9
Downgrading to v15.4.14 fixes the problem, so its an issue introduced betwen v15.4.14 and v15.4.18
Rgds Steve.
Observed with settings:
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: compat systemd
group: compat systemd
shadow: compat
gshadow: files
hosts: files dns
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
Quoting a solution from Teleport Engineer we got via Support ticket:
- If you don't need or don't want the NIS integration, you can change compat to files in your
/etc/nsswitch.conf
for user and group.
As we are not using NIS, we did that and it resolved our problem.
Expected behavior:
tsh ssh <node>
should work.Current behavior:
On some of our nodes it fails with:
Bug details:
Client and proxy:
On the node:
See above.
Please note that we have not made any recent changes on this node, but it running chef and is applying some automated updates. Also the Teleport itself is being automatically updated there.