Closed strideynet closed 1 year ago
Some thoughts I had whilst writing this
Users will need a way to configure that they want certificates/config generated for a leaf cluster. My current suggestion is that we add a Cluster
field to the DestinationConfig
which allows the user to specify that they want generated configurations for a specific cluster, where this field is not specified, we should fall back to the cluster that tbot
is directly connected to. My concern with this, is that it may be quite explicit for users who want to configure access to a large number of clusters. We may need to identify customers with the need for Machine ID leaf cluster support and see if this will serve their needs.
If we are going to generate configuration for leaf clusters, we will need to monitor Host CA rotations in leaf clusters, and ensure that we keep these up to date in the configured destinations. Scalability will be a concern here, some customers have thousands of leaf clusters and we want to ensure that we do not renew unnecessarily if something occurs in a leaf cluster.
@strideynet , we have root cluster at a region level (US/EU) and all the data centers within that region are leaf clusters. The users log into the root proxy and generate credentials. They then use those credentials to access leaf VMs via the leaf proxy. This allows the users to login to only login to 2 clusters and get access to all datacenters (<25)
To use machine ID ,we need a similar flow and to work with openssh since it's going to be used by various CI/CD tools and its not possible to use tsh for everything or login into several clusters .
Hey @anurag-work, we've had a quick look at this and we believe it should be possible to directly connect to openssh servers set up against leaf clusters using credentials from a Machine ID instance configured against the root cluster. Please let us know if you experience any issues with this, and we can take a look.
@strideynet I am unable to connect to a leaf node using the certs generated by Machine ID which is configured against the root cluster. I've tried a few combinations of the tbot proxy command to match the what happens when I connect using certs generated by tsh. Even though the tsh proxy command matches , the session using tbot cert gets a permisson denied
tbot proxy commands that I tried
ProxyCommand "/usr/local/bin/tbot" -d proxy --destination-dir=/Users/anurag/tbot-user --proxy=ctf01-teleport-proxy.company.com --cluster=leaf-cluster ssh %r@%h:%p
ProxyCommand "/usr/local/bin/tbot" -d proxy --destination-dir=/Users/anurag/tbot-user --proxy=ctf01-teleport-proxy.company.com ssh --cluster=leaf-cluster %r@%h:%p
tsh ssh proxy command
ProxyCommand "/Volumes/Data/usr/local/bin/tsh" -d proxy ssh --cluster=leaf-cluster --proxy=ctf01-teleport-proxy.company.com %r@%h:%p
tsh proxy ssh doesn't work with identity files when TLS routing is disabled: https://github.com/gravitational/teleport/issues/15190
Cluster may be set from the Issuer of the certificate and key lookup searches by --cluster
which is reused by ssh proxy subsystem to indicate target cluster:
// GetKey returns the user's key including the specified certs.
func (s *MemLocalKeyStore) GetKey(idx KeyIndex, opts ...CertOption) (*Key, error) {
var key *Key
if idx.ClusterName == "" {
// If clusterName is not specified then the cluster-dependent fields
// are not considered relevant and we may simply return any key
// associated with any cluster name whatsoever.
for _, found := range s.inMem[idx.ProxyHost][idx.Username] {
key = found
break
}
} else {
key = s.inMem[idx.ProxyHost][idx.Username][idx.ClusterName]
}
sshUserHost := fmt.Sprintf("%s:%s", sp.targetHost, sp.targetPort)
if err = sess.RequestSubsystem(ctx, proxySubsystemName(sshUserHost, sp.clusterName)); err != nil {
return trace.Wrap(err)
}
Hence, an identity file which is generated in a root cluster, cannot be used by tsh proxy ssh
to connect to a leaf cluster.
func makeClientForProxy(cf *CLIConf, proxy string, useProfileLogin bool) (*client.TeleportClient, error) {
# Begin generated Teleport configuration for root.tele.ottr.sh by tbot
# Common flags for all root.tele.ottr.sh hosts
Host *.root.tele.ottr.sh root.tele.ottr.sh
UserKnownHostsFile "/Users/noahstride/tbot-openssh-test/known_hosts"
IdentityFile "/Users/noahstride/tbot-openssh-test/key"
CertificateFile "/Users/noahstride/tbot-openssh-test/key-cert.pub"
HostKeyAlgorithms ssh-rsa-cert-v01@openssh.com
PubkeyAcceptedAlgorithms +ssh-rsa-cert-v01@openssh.com
Host *.leaf.tele.ottr.sh
UserKnownHostsFile "/Users/noahstride/tbot-openssh-test/known_hosts"
IdentityFile "/Users/noahstride/tbot-openssh-test/key"
CertificateFile "/Users/noahstride/tbot-openssh-test/key-cert.pub"
HostKeyAlgorithms ssh-rsa-cert-v01@openssh.com
PubkeyAcceptedAlgorithms +ssh-rsa-cert-v01@openssh.com
Port 3022
ProxyCommand ssh -F ~/path/back/to/this/config/file -p 3023 %r@root.tele.ottr.sh -s proxy:%h:%p@leaf.tele.ottr.sh
Essentially:
Limitation: TOFU still shows. TOFU would not show if we provided leaf cluster host certs.
Host *.leaf.tele.ottr.sh !leaf.tele.ottr.sh
Port 3022
ProxyCommand "/Users/noahstride/code/gravitational/teleport/build/tsh" proxy -d --identity=/Users/noahstride/code/gravitational/teleport/tbot-user/identity --proxy=root.tele.ottr.sh ssh --cluster=leaf.tele.ottr.sh %r@%h:%p
ERROR REPORT:
Original Error: *trace.NotFoundError key for {ProxyHost:root.tele.ottr.sh Username:bot-ssh-test ClusterName:leaf.tele.ottr.sh} not found
Stack Trace:
github.com/gravitational/teleport/lib/client/keystore.go:974 github.com/gravitational/teleport/lib/client.(*MemLocalKeyStore).GetKey
github.com/gravitational/teleport/lib/client/keyagent.go:337 github.com/gravitational/teleport/lib/client.(*LocalKeyAgent).GetKey
github.com/gravitational/teleport/lib/client/keyagent.go:677 github.com/gravitational/teleport/lib/client.(*LocalKeyAgent).ClientCertPool
github.com/gravitational/teleport/tool/tsh/proxy.go:261 main.dialSSHProxy
github.com/gravitational/teleport/tool/tsh/proxy.go:184 main.sshProxy
github.com/gravitational/teleport/tool/tsh/proxy.go:79 main.onProxyCommandSSH.func1
github.com/gravitational/teleport/lib/client/api.go:731 github.com/gravitational/teleport/lib/client.RetryWithRelogin
github.com/gravitational/teleport/tool/tsh/proxy.go:67 main.onProxyCommandSSH
github.com/gravitational/teleport/tool/tsh/tsh.go:1036 main.Run
github.com/gravitational/teleport/tool/tsh/tsh.go:448 main.main
runtime/proc.go:250 runtime.main
runtime/asm_arm64.s:1172 runtime.goexit
User Message: key for {ProxyHost:root.tele.ottr.sh Username:bot-ssh-test ClusterName:leaf.tele.ottr.sh} not found
Some thoughts I had whilst writing this
Users will need a way to configure that they want certificates/config generated for a leaf cluster. My current suggestion is that we add a
Cluster
field to theDestinationConfig
which allows the user to specify that they want generated configurations for a specific cluster, where this field is not specified, we should fall back to the cluster thattbot
is directly connected to. My concern with this, is that it may be quite explicit for users who want to configure access to a large number of clusters. We may need to identify customers with the need for Machine ID leaf cluster support and see if this will serve their needs.
That sounds good to me, as long as we can specify a leaf clusters list as a regex pattern or any similar wildcard ! In my use-case, leaf clusters don't necessarily exists yet, but will in the future... Their naming convention ensure they'll be clearly identified, and I don't expect to have to touch machine-id config each time a new cluster is created ;)
It has occurred to me that the behaviour of tsh config
is to output all leaf clusters.
I think it might be nice to match this behaviour as it meets the needs of most users, and reduces the amount of configuration overhead. If this proves to be problematic from a performance point of view, we can consider introducing the ability to filter which of them should be included in the configuration - and users with a larger number of leaf clusters can opt into this filtering.
@Joerger's recent modifications to the tsh
store seem to have fixed the following error
ERROR: key for {ProxyHost:teleport.local.ottr.sh Username:bot-robot1 ClusterName:leaf.local.ottr.sh} not found
ERROR: unable to execute tsh
executing `tsh proxy`
exit status 1
I can confirm this fix is present from v12.0.0 onwards - older tsh
and tbot
clients will not be supported.
This means that the remaining work in #23368 resolves this issue entirely.
As it stands,
tbot
only supports generating configuration and fetching certificates for the cluster it is directly connected to. Some customers will want to allow Machine ID to access resources in leaf clusters of that cluster.We should seek to identify customers who have a need for leaf cluster access to validate some questions before embarking on this work.
Questions we need to answer:
tsh
works withtbot
, are there customers who have needs that are not served by this ? e.g do they want standard OpenSSH and a generatedssh-config
for a leaf cluster?It is entirely possible that using
tsh
withtbot
provided identity gives all the functionality needed, but we should validate this and document this as the supported method for accessing leaf clusters.Current state of affairs
I’ve tested Machine ID with SSH and a teleport Node in a leaf cluster with two variations:
tsh ssh
with the identity file output in the destination directory oftbot
tbot
In my examples:
teleport.local.ottr.sh
is my root cluster, and the location of the proxy that tbot is configured against.leaf.local.ottr.sh
is my leaf cluster.leaf.leaf.local.ottr.sh
is the node within the leaf cluster I'm trying to access.Using
tsh ssh
This works out of the box. Assuming your bot's destination directory is
/dest
and your bot is configured to output a identity file, you can do the following:I also noticed that the role in your root cluster that your tbot is assuming must include the principal you are trying to log into the node in the leaf cluster within
logins
. You must also have that principle within the leaf cluster role, or{{internal.logins}}
.Using openssh
This is a little more difficult, as the
ssh_config
generated bytbot
doesn't include rules that will match hosts in leaf clusters. I was able to manually craft assh_config
that did work though:The
known_hosts
includes the host CA for the root cluster, but not for leaf clusters. This means that the user will be prompted to trust the host key on first connection.You may notice that the
--cluster
directive is omitted from theProxyCommand
. When configuring this to the leaf cluster,tsh
emits the following error:It seems odd to me that this is emitted by
tsh proxy ssh
but nottsh ssh
. I need to look into what's causing this, however, as long as the cluster directive is dropped, the rest of this works as expected.Summary
This essentially gives the following tasks to improve tbot for use with openssh and leaf clusters:
known_hosts
Host
ssh_config blocks for each configured leaf clusterkey for {xyz} not found
error