gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.35k stars 1.74k forks source link

Yubikey race condition using tsh #46372

Open milos-teleport opened 1 week ago

milos-teleport commented 1 week ago

Expected behavior: As a user who uses YubiKey solely for Teleport, and does not use multiple tsh commands at the same time, tsh should not give me errors relating to Yubikey being used by another application.

Current behavior: When using tsh ls or tsh ssh I sometimes get this error:

tsh ls
Enter your YubiKey PIV PIN:
ERROR: connection error: desc = "transport: authentication handshake failed: tls: failed to sign handshake: could not connect to YubiKey as another application is using it. Please try again once the program that uses the YubiKey, such as yubikey-agent is closed"

Bug details:

Workaround: Retry failed command

Quote from @nklaassen:

I'm pretty sure the issue is tsh trying to use the same yubikey concurrently. When we dial to the auth server for something like tsh ls we race like 5 different dial methods https://github.com/gravitational/teleport/blob/d8292350ed7bca30929c5f1c326547746599ee3b/api/client/client.go#L320-L321. One of those lucky connection methods is going to open the yubikey connection first and prompt for PIN/touch, meanwhile all the other connection methods are going to time out. If that lucky first connection method wasn't the right one, all connection methods fail and the request fails. It seems to usually work if you retry because IIRC we cache the tap for ~15s.

gzdunek commented 1 week ago

I'm currently working on adding support for hardware keys to Teleport Connect, and part of this task is refactoring yubikey.go to allow it handling concurrent requests (coming from Connect). An excerpt from my last status update, where I described the solution:

I'm refactoring the stateless code in yubikey.go so that there is now a “global" stateful YubiKey manager in tshd that coordinates access to the physical key. This better fits to the Connect model, where we often issue requests concurrently (as opposed to tsh, where things tend to happen sequentially). Thanks to that, if the user has to take the action (like touching the key), all other requests will wait for it to complete.

tsh will benefit from this change too - all calls to YubiKey within a single command (like tsh ls) will access it sequentially. This should fix the issue🤞.

nklaassen commented 1 week ago

by the way, i no longer think tsh is affected by the connection method race at the line i linked in that quote. It's probably the gRPC library or maybe something to do with how we tunnel TSL over TLS when dialing through a TLS-terminating load-balancer