Closed r0mant closed 2 years ago
Looks like we "regressed" and increased the GLIBC dependency again.
Edit: this appears to be related to the Rust version. Reverting to 1.58.1 seems to fix it.
I will downgrade for now: https://github.com/gravitational/teleport/pull/13544
A few preliminary findings:
tctl
and teleport
always print the warning below on macOS, which I think could be downgraded:$ tctl -c ./teleport.yaml users ls
> 2022-06-15T17:29:04-03:00 WARN Disabling host user creation as this feature is only available on Linux config/configuration.go:998
$ teleport start -c ./teleport.yaml
> 2022-06-15T17:28:58-03:00 WARN Disabling host user creation as this feature is only available on Linux config/configuration.go:998
tctl
still mentions the (removed) "admin" role:$ tctl -c ./teleport.yaml users add --help
(...)
Examples:
> tctl users add --roles=admin,dba joe
This creates a Teleport account 'joe' who will assume the roles 'admin' and 'dba'
To see the permissions of 'admin' role, execute 'tctl get role/admin'
tsh
Touch ID authn isn't respecting users and picking the "oldest" credentialRepro by adding >1 credential and then >1 users. 😢
I'll focus on (3), (1) and (2) are easy pickings if someone wants to fix them.
@lxea Could you take a look at "1" and "2" from Alan's comment above?
I noticed in the audit log when I do anything on my database (mysql) the log entries always show [undefined], even if I select a database explicitly during my session with "use
User [remote-alice-cluster1] has executed query [show tables] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has executed query [show databases] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has changed default database to [foodb] on [testmysql]
edit: found an issue for this #5903
It appears the behavior is to always show the database name used on login.
So if I do $ tsh db login --db-name=foodb testmysql
or tsh db connect --db-name=foodb testmysql
then all audit logs in that session will show [foodb] as the database. If I switch databases in mysql with use otherdb
, then audit log continues to show actions as if they were done in [foodb]. If I don't specify any --db-name with login/connect then it's always [undefined].
I found a tsh ssh -J
regression related to TLS routing - https://github.com/gravitational/teleport/issues/13554
tsh play <chunk-id>
can fetch and print a session chunk archive.
Not concerned this is a blocker, and may actually just be the test plan being incorrect. This command fails with offset 0 not found for session
. This is because by default tsh play
attempts to play a session back to the PTY which is not compatible with application access session recordings. Running the command with --format json
succeeds. Looking at the blame of the code, it doesn't look like this is a recent regression, and may have always been the case.
Do we want to update the test plan with the correct command ? I imagine eventually it would be nice if user's didn't have to provide this flag for the command to work, but given how we currently switch in the implementation between two modes, it will probably involve rewriting onPlay
to support that.
Discovered a regression with using the configuration output by teleport configure
: https://github.com/gravitational/teleport/issues/13558
I'll write a fix for this today and we should be able to get it merged down asap.
This fix has been merged down to branch/v10 and I can confirm the regression appears to be fixed.
Discovered some backwards incompatibility with SSO login: https://github.com/gravitational/teleport/issues/13575
Edit (Joerger): Fixed in https://github.com/gravitational/teleport/pull/13589
Found a regression in tsh join
, I'll try fixing it.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x17b3fc0]
goroutine 1 [running]:
github.com/gravitational/teleport/api/types.(*SessionTrackerV1).GetAddress(0x390c700?)
/home/bjoerger/gravitational/teleport/api/types/session_tracker.go:274
github.com/gravitational/teleport/lib/client.(*TeleportClient).Join(0xc00025e700, {0x3931f90, 0xc0000541f8}, {0x341aee2, 0x4}, {0x3426c0b?, 0x7}, {0x7ffd260b70f7, 0x24}, {0x0, ...})
/home/bjoerger/gravitational/teleport/lib/client/api.go:1976 +0x6f2
main.onJoin.func1()
/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2584 +0x65
github.com/gravitational/teleport/lib/client.RetryWithRelogin({0x3932000, 0xc000a4c4b0}, 0xc00025e700, 0xc000b3e550)
/home/bjoerger/gravitational/teleport/lib/client/api.go:719 +0x4e
main.onJoin(0xc0006ac000)
/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2583 +0x1b5
main.Run({0x39330d8, 0xc0002ae780}, {0xc00004e090, 0x3, 0x3}, {0x0, 0x0, 0xc0000021a0?})
/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:859 +0x12445
main.main()
/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:396 +0x318
Edit: fixed in https://github.com/gravitational/teleport/pull/13596
Possible regression: I can't join/view my own sessions despite having permissions to do so. Am I missing something in https://goteleport.com/docs/ver/10.0/access-controls/reference/?
Some issues I ran into while testing kube access locally:
tsh kube exec --tty --stdin shell-demo /bin/sh
leads to panic:
> tsh kube exec --tty --stdin shell-demo /bin/sh
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x106905790]
goroutine 1 [running]:
main.(*StreamOptions).SetupTTY(0x14000abe410)
/Users/gavin/work/teleport/tool/tsh/kube.go:281 +0x180
main.(*ExecOptions).Run(0x14000abe410)
/Users/gavin/work/teleport/tool/tsh/kube.go:356 +0x280
main.(*kubeExecCommand).run(0x14000674600, 0x0?)
/Users/gavin/work/teleport/tool/tsh/kube.go:467 +0x388
main.Run({0x1075eac90, 0x140006f1540}, {0x140001b6010, 0x6, 0x6}, {0x0, 0x0, 0x300000002?})
/Users/gavin/work/teleport/tool/tsh/tsh.go:896 +0x12e98
main.main()
/Users/gavin/work/teleport/tool/tsh/tsh.go:396 +0x2c0
[19:16:57] gavin@mac ~ [SIGINT]
> kubectl exec -it shell-demo -- /bin/sh
# whoami
root
tsh kube credentials
issue when --teleport-cluster
flag does not match $TELEPORT_CLUSTER
rm -rf ~/.tsh && kubectl get pods
prompts me for my password and then prints an error message, but if I just run kubectl get pods
again, it works.```markdown [19:08:20] gavin@mac ~ [1] > rm -rf ~/.tsh [19:09:07] gavin@mac ~ > tenv show TELEPORT_CLUSTER=cluster2 TELEPORT_DEV_OUT=/tmp/out2.log TELEPORT_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml TELEPORT_USER=alice TELEPORT_DEV_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml TELEPORT_PROXY=proxy2.local.gd:4080 [19:09:11] gavin@mac ~ > bat ~/.kube/config | rg "exec" -A 10 exec: apiVersion: client.authentication.k8s.io/v1beta1 args: - kube - credentials - --kube-cluster=minikube - --teleport-cluster=cluster1 - --proxy=proxy1.local.gd:3080 - --insecure command: /Users/gavin/work/teleport/build/tsh env: null [19:09:41] gavin@mac ~ > kubectl get pods Enter password for Teleport user alice: WARNING: You are using insecure connection to SSH proxy https://proxy1.local.gd:3080 ERROR: SSH cert not available Unable to connect to the server: getting credentials: exec: executable /Users/gavin/work/teleport/build/tsh failed with exit code 1 [19:09:57] gavin@mac ~ [1] > kubectl get pods NAME READY STATUS RESTARTS AGE shell-demo 1/1 Running 0 75m ```
@GavinFrazar
tsh kube credentials issue when --teleport-cluster flag does not match $TELEPORT_CLUSTER
I'm not sure if this would fix the outlined issue, but I noticed recently that a couple of --cluster
and --teleport-cluster
flags should really use .Envar(clusterEnvVar)
in their mix. At the time I didn't realise this may cause issues like the one you outlined, but perhaps the fix is as simple as adding that call to the mix as appropriate. For example:
1)
c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().StringVar(&c.teleportCluster)
becomes
c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().Envar(clusterEnvVar).StringVar(&c.teleportCluster)
2)
ssh.Flag("cluster", clusterHelp).Short('c').StringVar(&cf.SiteName)
becomes
ssh.Flag("cluster", clusterHelp).Envar(clusterEnvVar).Short('c').StringVar(&cf.SiteName)
@atburke
Regression due to https://github.com/gravitational/teleport/pull/12934:
Basically the logic between onListDatabases
and listDatabasesAllClusters
is out of sync. The former contains the correct code to fetch roles:
The latter does not (profile.Roles
):
The result is that we try to get definition for role which we do not have in the leaf cluster and we may not have permission to do so.
For example, given clusters boson.tener.io
and quark.tener.io
and the trusted cluster role mapping giving only access
role:
kind: trusted_cluster
metadata:
id: 1655472056507184000
name: boson.tener.io
spec:
enabled: true
role_map:
- local:
- access
remote: access
token: foo
tunnel_addr: boson.tener.io:3080
web_proxy_addr: boson.tener.io:3080
version: v2
We will get errors when tsh
tries to read
the editor
and auditor
roles from quark.tener.io
. This is an error because the mapping only gives access
role. The code in onListDatabases
correctly handles that case.
$ tsh clusters
Cluster Name Status Cluster Type Labels Selected
-------------- ------ ------------ ------ --------
boson.tener.io online root *
quark.tener.io online leaf
$ tsh db ls
Name Description Allowed Users Labels Connect
---- ----------- ------------- ------ -------
$ tsh db --cluster=quark.tener.io ls
Name Description Allowed Users Labels Connect
------------------------------- ------------------- ----------------- ------- ------------------------------------------------------------------------
> qmongo (user: alice) [alice bob tener] tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo
> qmongo-insecure (user: alice) [alice bob tener] tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo-insecure
redisquark Quark Redis example [alice bob tener] env=dev
$ tsh db --cluster=quark.tener.io ls --all
ERROR: access denied to perform action "read" on "role"
I'm unlikely to have the time to fix it before my PTO.
I found two issues related to the host user creations https://github.com/gravitational/teleport/issues/13663 https://github.com/gravitational/teleport/issues/13662
found an issue with the "Instance" role and the EC2 join method https://github.com/gravitational/teleport/issues/13677
I found an issue with LDAP attribute labeling
- it does not work correctly: #13680
Regexp-based host labeling applies across all desktops, regardless of origin.
I don't know if this is an issue or not, but I had a hard time figuring out why it does not work the way I would expect it to work. There is an inconsistency between how we treat LDAP discovered hosts vs static hosts.
Scenario 1: LDAP hosts
windows_desktop_service:
...
discovery:
base_dn: "*"
host_labels:
- match: '^.*\.example\.com$'
labels:
environment: dev
Using this configuration if the discovered host has dns host name set as EXAMPLE-82K6DLP.example.com
we'll get regexp match and that host will have an extra label environment/dev
Scenario 2: Static hosts
windows_desktop_service:
...
hosts:
- EXAMPLE-82K6DLP.example.com
host_labels:
- match: '^.*\.example\.com$'
labels:
environment: dev
Using this configuration, with the same regexp and the same dns host name for a static host we won't get a regexp match and this host won't have an extra label.
The reason being for that is in case of static hosts, we do try to match regexp against hostname:port
. In our example we would compare our regex with EXAMPLE-82K6DLP.example.com:3389
which would fail to match because of the $
at the end of our regexp.
Since I don't know if this was intended or we should fix it by changing the behavior of it to just use host without port it would be great if @zmb3 could take a look into my comment as I think he is the author of this functionality.
@LKozlowski I don't think we ever noticed this before, but technically regex-based labeling is working as intended, we're just not clear in the docs or examples that the port is included.
Feels like the simplest thing would be to remove the $
from the examples and mention in the docs that the port is included in the match for static hosts.
That will end up match anything with an example.com
prefix tho; perhaps the docs should add a (:3389)?
before the $
instead, if that works (or a (:\d+)?
, if we want to be pedantic).
I found an issue with desktop access scroll behavior: https://github.com/gravitational/teleport/issues/13690
That will end up match anything with an
example.com
prefix tho; perhaps the docs should add a(:3389)?
before the$
instead, if that works (or a(:\d+)?
, if we want to be pedantic).
Sure, that works. Or I'm also fine not matching against the host and not the port.
I don't see this as a major issue since it has always been this way, and few people use static hosts.
~It seems unfortunate to have these error logs by default, I thought I saw a PR to remove them but now I can't find it, are we removing these @lxea @atburke? I haven't intentionally enabled either of these features, and my log just fills with these errors over time.~
2022-06-21T23:22:55Z ERRO [EC2LABELS] Error fetching EC2 tags: object not found ec2/ec2.go:144
2022-06-21T23:22:55Z ERRO Error during temporary user cleanup: group: unknown group teleport-system srv/usermgmt.go:341
Edit: my bad, these are already fixed, sorry for the noise
@nklaassen #13529 should fix the EC2 labels error.
That will end up match anything with an
example.com
prefix tho; perhaps the docs should add a(:3389)?
before the$
instead, if that works (or a(:\d+)?
, if we want to be pedantic).Sure, that works. Or I'm also fine not matching against the host and not the port.
I don't see this as a major issue since it has always been this way, and few people use static hosts.
I just wanted to bring it up as it wasn't clear for me when I was testing it, but I agree that it is working fine. As you said, we just need to either update docs or slightly update the code. Anyway, I'll mark it in the test plan as working and we'll just improve it later so it doesn't block the v10 release.
Found a compatibility issue between v9 leafs and v10 roots related to the new database CA:
Is tsh status
supposed to report -teleport-internal-join
as one of the SSH logins? I can see it in the logins list for v10 clusters but not for the ones running older versions of Teleport.
Is tsh status supposed to report -teleport-internal-join as one of the SSH logins?
We should probably filter out that one and the -teleport-nologin-<uuid>
ones.
ssh -J <teleport-proxy>
doesn't work with tls routing (since v8.0.0) - https://github.com/gravitational/teleport/issues/13833
tsh
does not work on Debian 9 due to glibc 2.25 dependency - #13894
I'm seeing a "session data" event that I'm not used to seeing which renders with a missing session ID in the audit log.
It's not just a UI thing, the JSON for the event has "sid": ""
.
Direct Dial Nodes unreachable because they are reporting an address of [::]:3022
https://github.com/gravitational/teleport/issues/13898
Reverse Tunnel Nodes getting stuck initializing and not connecting: https://github.com/gravitational/teleport/issues/13911
Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):
2022-06-27T17:58:47-03:00 [UPLOAD] WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253
^C2022-06-27T17:58:51-03:00 [PROC:1] INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83
2022-06-27T17:58:51-03:00 [PROC:1] WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682
2022-06-27T17:58:51-03:00 [AUDIT:1] INFO File uploader is shutting down. pid:27917.1 service/service.go:2480
2022-06-27T17:58:51-03:00 [AUDIT:1] INFO File uploader has shut down. pid:27917.1 service/service.go:2482
I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.
Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):
2022-06-27T17:58:47-03:00 [UPLOAD] WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253 ^C2022-06-27T17:58:51-03:00 [PROC:1] INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83 2022-06-27T17:58:51-03:00 [PROC:1] WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682 2022-06-27T17:58:51-03:00 [AUDIT:1] INFO File uploader is shutting down. pid:27917.1 service/service.go:2480 2022-06-27T17:58:51-03:00 [AUDIT:1] INFO File uploader has shut down. pid:27917.1 service/service.go:2482
I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.
This happened to me as well and adding auth_service.session_recording = off
into the config failed to stop the warning. If that provides any further context
my (idle) local teleport was spamming a session recording warning
Should be fixed by https://github.com/gravitational/teleport/pull/13826, fixing the warning in a running cluster involves manually deleting the file in the recordings I think.
Can't get passwordless scenario to work as described in the test plan:
tsh mfa add
✅ tsh mfa ls
and tsh touchid ls
(the latter also brings up touchid prompt) ✅ tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless
doesn't work, asking to tap a security key (which I didn't register any separately) ❌ ➜ e git:(afa3414) ✗ tsh login --proxy=root.gravitational.io:3080 --auth=passwordless
Tap your security key
^CERROR: context canceled
Logs:
➜ e git:(afa3414) ✗ tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless
DEBU [CLIENT] open /Users/r0mant/.tsh/root.gravitational.io.yaml: no such file or directory client/api.go:1052
INFO [CLIENT] No teleport login given. defaulting to r0mant client/api.go:1394
INFO [CLIENT] no host login given. defaulting to r0mant client/api.go:1404
INFO [CLIENT] [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.0G1kn68Tdf/Listeners" client/api.go:3934
DEBU [CLIENT] attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT] reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT] could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU Attempting GET root.gravitational.io:3080/webapi/ping/passwordless webclient/webclient.go:115
DEBU [CLIENT] attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT] reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT] could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU [CLIENT] HTTPS client init(proxyAddr=root.gravitational.io:3080, insecure=false) client/weblogin.go:233
DEBU Attempting platform login webauthncli/api.go:97
DEBU Platform login failed, falling back to cross-platform error:[credential not found] webauthncli/api.go:103
DEBU FIDO2: Using libfido2 for assertion webauthncli/api.go:113
DEBU FIDO2: Info for device ioreg://4294970624: &libfido2.DeviceInfo{Versions:[]string{"U2F_V2", "FIDO_2_0", "FIDO_2_1_PRE"}, Extensions:[]string{"credProtect", "hmac-secret"}, AAGUID:[]uint8{0xee, 0x88, 0x28, 0x79, 0x72, 0x1c, 0x49, 0x13, 0x97, 0x75, 0x3d, 0xfc, 0xce, 0x97, 0x7, 0x2a}, Options:[]libfido2.Option{libfido2.Option{Name:"rk", Value:"true"}, libfido2.Option{Name:"up", Value:"true"}, libfido2.Option{Name:"plat", Value:"false"}, libfido2.Option{Name:"clientPin", Value:"false"}, libfido2.Option{Name:"credentialMgmtPreview", Value:"true"}}, Protocols:[]uint8{0x1}} webauthncli/fido2.go:658
DEBU FIDO2: Device ioreg://4294970624: filtered due to lack of UV webauthncli/fido2.go:137
Tap your security key
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
^C
cc @codingllama
Can't get passwordless scenario to work as described in the test plan:
@r0mant could you double-check that you are using tsh
from the signed/notarized/etc tsh.app
bundle? I downloaded the tsh-v10.0.0-alpha.2.pkg
installer and cleared the testplan without problems using it. Hit me up on Slack if you still have issues.
@codingllama @r0mant all clear on the passwordless test plan for me on macOS.
kubectl logs -n loadtest-tross soaktest-pvnlr-6gv5f -f
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth -l root ls -f names
node-65c8f5c9db-5zzfd
iot-node-5b4f7757f8-f2966
----Direct Dial Node Test----
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-65c8f5c9db-5zzfd ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 157 ms
50 162 ms
75 168 ms
90 174 ms
95 178 ms
99 193 ms
100 474 ms
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-65c8f5c9db-5zzfd ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 159 ms
50 164 ms
75 170 ms
90 175 ms
95 180 ms
99 195 ms
100 5179 ms
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-5b4f7757f8-f2966 ls
----Reverse Tunnel Node Test----
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 155 ms
50 160 ms
75 166 ms
90 172 ms
95 178 ms
99 193 ms
100 418 ms
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-5b4f7757f8-f2966 ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 154 ms
50 159 ms
75 165 ms
90 170 ms
95 175 ms
99 192 ms
100 5171 ms
https://teleportcoreteam.grafana.net/goto/vJFIH33nk?orgId=1
Aggregate last 3 releases.
Backend | Cluster Size | Mode | PTY | 8.0 | 9.0 | 10.0 |
---|---|---|---|---|---|---|
etcd | 10k | Regular | No | 3335 ms | 700 ms | 474 ms |
etcd | 10k | Regular | Yes | 4647 ms | 393 ms | 5179 (99%: 195ms) |
etcd | 10k | Tunnel | No | 4259 ms | 143 ms | 418 ms |
etcd | 10k | Tunnel | Yes | 3143 ms | 799 ms | 5171 ms (99%: 192ms) |
DynamoDB | 10k | Regular | No | 5147 ms | ||
DynamoDB | 10k | Regular | Yes | 222 ms | ||
DynamoDB | 10k | Tunnel | No | 235 ms | ||
DynamoDB | 10k | Tunnel | Yes | 198 ms | ||
DynamoDB | 1 | Regular | No | 1824 ms | ||
DynamoDB | 1 | Regular | Yes | 1483 ms | ||
DynamoDB | 1 | Tunnel | No | 2125 ms | ||
DynamoDB | 1 | Tunnel | Yes | 2002 ms |
note: Initial dynamo 10k tests are not complete yet due to issues with the test automation, but I've gotten up to a 6k dynamo cluster without any issues on teleport's end of things. Working on re-running with different automation.
edit: See https://github.com/gravitational/teleport/issues/13340#issuecomment-1180681544 for updated bench numbers.
tsh bench --duration=30m root@node-848df68b94-zzxjg ls
* Requests originated: 17934
* Requests failed: 109
* Last error: EOF
Histogram
Percentile Response Duration
---------- -----------------
25 5939 ms
50 9655 ms
75 13911 ms
90 16655 ms
95 17519 ms
99 18351 ms
100 55071 ms
tsh bench --duration=30m --interactive root@node-848df68b94-zzw65 ps aux
* Requests originated: 17903
* Requests failed: 22
* Last error: failed connecting to node node-848df68b94-zzw65.
Histogram
Percentile Response Duration
---------- -----------------
25 6115 ms
50 9879 ms
75 14103 ms
90 16751 ms
95 17583 ms
99 18431 ms
100 45471 ms
Note: benches run concurrently with scaling and against nodes in a different region/cloud, which I think explains the differences in response duration. Looking into it.
tsh bench --duration=30m root@172.31.4.81 ls
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 185 ms
50 197 ms
75 211 ms
90 232 ms
95 251 ms
99 358 ms
100 2161 ms
tsh bench --duration=30m --interactive root@172.31.9.206 ps aux
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 193 ms
50 206 ms
75 221 ms
90 240 ms
95 260 ms
99 418 ms
100 4579 ms
Note: these benches were run against individual bare-metal nodes within a 2-node cluster with tsh located within the same vpc as the auth, proxy, and nodes.
(previously posted dynamodb bench numbers were from a 10k cluster with sub-optimal network conditions, and therefore not particularly useful for comparison)
tsh bench --duration=30m root@ip-172-31-4-81-us-west-2-compute-internal ls
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 198 ms
50 210 ms
75 222 ms
90 238 ms
95 255 ms
99 372 ms
100 3495 ms
tsh bench --duration=30m --interactive root@ip-172-31-9-206-us-west-2-compute-internal ps aux
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 221 ms
50 231 ms
75 244 ms
90 262 ms
95 280 ms
99 466 ms
100 2003 ms
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.
[x] Adding nodes to a cluster @avatus
[x] Labels @avatus
[x] Trusted Clusters @EdwardDowling @hugoShaka
[x] RBAC @alistanis
Make sure that invalid and valid attempts are reflected in audit log.
[x] Verify that custom PAM environment variables are available as expected. @xacrimon
[x] Users @codingllama
With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
WebAuthn in the release
tsh
binary is implemented using libfido2. Ask for a statically built pre-release binary for realistic tests. (tsh fido2 diag
should work in our binary.)Touch ID requires a signed
tsh
, ask for a signed pre-release binary so you may run the tests.tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedtsh mfa add
U2F devices must be registered in a previous version of Teleport.
Using Teleport v9, set
auth_service.authentication.second_factor = u2f
, restart the server and then register an U2F device (tsh mfa add
). Upgrade the install to the current Teleport version (one major at a time) and try to login using the U2F device as your second factor - it should work.[x] Backends
[x] Session Recording @gabrielcorado
[x] Audit Log @gabrielcorado
Node/Proxy ID may be found at
/var/lib/teleport/host_uuid
in the corresponding machine.Node IDs may also be queried via
tctl nodes ls
.scp
commands are recordedSubsystem testing may be achieved using both Recording Proxy mode and OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node running OpenSSH/sshd, you may use the following command to trigger a subsystem audit log:[x] Interact with a cluster using
tsh
@alistanis @hugoShakaThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
[x] Interact with a cluster using
ssh
@Joerger Make sure to test both recording and regular proxy modes.[x] Verify proxy jump functionality @Joerger Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.
[x] Interact with a cluster using the Web UI @Joerger
User accounting @xacrimon
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @capnspacehook
For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @tigrato
Teleport with multiple Kubernetes clusters @tigrato
Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columTeleport with FIPS mode @alistanis @r0mant
ACME @rudream
Migrations @hugoShaka
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers @ptgott @Tener
tctl sso
family of commands @Tenertctl sso configure
helps to construct a valid connector definition:tctl sso configure github ...
creates valid connector definitionstctl sso configure oidc ...
creates valid connector definitionstctl sso configure saml ...
creates valid connector definitionstctl sso test
test a provided connector definition, which can be loaded from file or piped in withtctl sso configure
ortctl get --with-secrets
. Valid connectors are accepted, invalid are rejected with sensible error messages.tctl sso test
.Teleport Plugins @marcoandredinis
AWS Node Joining @nklaassen
Docs
ec2:DescribeInstances
permissions for local account:TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
Passwordless @r0mant @espadolini
Passwordless requires
tsh
compiled with libfido2 for most operations (apart from Touch ID). Ask for a statically-builttsh
binary for realistic tests.Touch ID requires a properly built and signed
tsh
binary. Ask for a pre-release binary so you may run the tests.This sections complements "Users -> Managing MFA devices". Ideally both macOS and Linux
tsh
binaries are tested for FIDO2 items.[x] Diagnostics
Both commands should pass all tests.
tsh fido2 diag
tsh touchid diag
[ ] Registration
tsh mfa add
, choose WEBAUTHN and passwordless)tsh mfa add
, choose TOUCHID)[ ] Login
tsh login --auth=passwordless
)tsh login --auth=passwordless
)tsh login --auth=passwordless --mfa-mode=cross-platform
uses FIDO2tsh login --auth=passwordless --mfa-mode=platform
uses Touch IDtsh login --auth=passwordless --mfa-mode=auto
prefers Touch IDauth_service.authentication.passwordless = false
)auth_service.authentication.connector_name = passwordless
)tsh login --auth=local
)[ ] Touch ID support commands
tsh touchid ls
workstsh touchid rm
works (careful, may lock you out!)WEB UI @kimlisa @rudream @hatched
Main
For main, test with a role that has access to all resources.
Top Nav
Side Nav
>
, and expand has iconv
Servers aka Nodes
Add Server
button renders dialogue set toAutomatically
viewRegenerate Script
regenerates token value in the bash commandManually
tab renders manual stepsAutomatically
tab renders bash commandApplications
Add Application
button renders dialogueGenerate Script
, bash command is renderedRegenerate
button regenerates token value in bash commandDatabases
Add Database
button renders dialogue for manual instructions:Step 4
changesStep 5
commandsActive Sessions
Audit log
Session Ended
event icon, takes user to session playerdetails
buttonUsers
Auth Connectors
Roles
Managed Clusters
Help & Support
Access Requests
Access Request is a Enterprise feature and is not available for OSS.
Creating Access Requests (Role Based)
Create a role with limited permissions
allow-roles-and-nodes
. This role allows you to see the Role screen and ssh into all nodes.Create another role with limited permissions
allow-users-with-short-ttl
. This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.Create a user that has no access to anything but allows you to request roles:
allow-roles-and-nodes
andallow-users-with-short-ttl
are listedCreating Access Requests (Search Based)
Create a role with access to searcheable resources (apps, db, kubes, nodes, desktops). The template
searcheable-resources
is below.Create a user that has no access to resources, but allows you to search them:
searcheable-resources
rulesViewing & Approving/Denying Requests
Create a user with the role
reviewer
that allows you to review all requests, and delete them.Assuming Approved Requests (Role Based)
allow-roles-and-nodes
allows you to see roles screen and ssh into nodesallow-roles-and-nodes
, verify that assumingallow-users-short-ttl
allows you to see users screen, and denies access to nodesswitching back
goes back to your default static roleallow-users-short-ttl
role, the user is automatically logged out after the expiry is met (4 minutes)Assuming Approved Requests (Search Based)
Assuming Approved Requests (Both)
Access Request Waiting Room
Strategy Reason
Create the following role:
request_prompt
settingsend request
, pending dialogue rendersStrategy Always
With the previous role you created from
Strategy Reason
, changerequest_access
toalways
:Logout
and clicking goes back to the login screenStrategy Optional
With the previous role you created from
Strategy Reason
, changerequest_access
tooptional
:Terminal
Node List Tab
Session Tab
$ sudo apt-get install mc
$ mc
Session Player
Invite and Reset Form
Login Form and Change Password
Multi-factor Authentication (mfa)
Create/modify
teleport.yaml
and set the following authentication settings underauth_service
MFA invite, login, password reset, change password
second_factor
type toon
and verify that mfa is required (no optionnone
in dropdown)MFA require auth
Go to
Account Settings
>Two-Factor Devices
and register a new deviceUsing the same user as above:
MFA Management
second_factor
set tooff
disables adding devicesPasswordless
Cloud
From your cloud staging account, change the field
teleportVersion
to the test version.Recovery Code Management
Invite/Reset
Recovery Flow: Add new mfa device
Recovery Flow: Change password
Recovery Email
RBAC
Create a role, with no
allow.rules
defined:Add Server, Application, Databases, Kubernetes
button in each respective viewServers
,Apps
,Databases
, andKubernetes
are listed underoptions
button inManage Clusters
Note: User has read/create access_request access to their own requests, despite resource settings
Add the following under
spec.allow.rules
to enable read access to the audit log:Audit Log
andSession Recordings
is accessibleAdd the following to enable read access to recorded sessions
Add the following to enable read access to the roles
Add the following to enable read access to the auth connectors
Add the following to enable read access to users
Add the following to enable read access to trusted clusters
Performance/Soak Test @rosstimothy @espadolini
Using
tsh bench
tool, perform the soak tests and benchmark tests on the following configurations:Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB
Soak Tests
Run 4hour soak test with a mix of interactive/non-interactive sessions:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Breaking load tests
Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive and non interactive tsh bench loads.
Teleport with Cloud Providers
AWS @lxea
GCP @EdwardDowling
IBM @r0mant
Application Access @strideynet
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
.tsh aws
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
dialogue works (refresh app screen to see it registered)Database Access @smallinsky
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.db_names
.db.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
TLS Routing @smallinsky
v2
configuration starts only a single listener.multiplex
modeauth_service.proxy_listener_mode: "multiplex"
web_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
modetsh db proxy
with a GUI client.multiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" user@host.example.com
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" user@node.foo.com
tsh ssh
access through proxy running in multiplex modemultiplex
modeDesktop Access
Basic Sessions (@LKozlowski)
listen_addr
):hosts
section.hosts
section.windows_desktop_service
s to the same Teleport cluster, verify that connections to desktops on different AD domains works. (Attempt to connect several times to verify that you are routed to the correctwindows_desktop_service
)User Input (@ibeckermayer)
Verify user input
Locking and access (@ibeckermayer)
client_idle_timeout
to a small value and verify that idle sessions are terminated (the session should end and an audit event will confirm it was due to idle connection)Labeling (@LKozlowski)
teleport.dev/origin
label.teleport.dev
labels for OS, OS Version, DNS hostname, and OU.RBAC (@zmb3)
Clipboard Support (@zmb3)
Per-Session MFA (try webauthn on each of Chrome, Safari, and Firefox) @zmb3
Session Recording (@LKozlowski)
mode: node-sync
ormode: proy-sync
)mode: node
ormode: proxy
)Audit Events (check these after performing the above tests) (@ibeckermayer)
windows.desktop.session.start
(TDP00I
) emitted on startwindows.desktop.session.start
(TDP00W
) emitted when session fails to start (due to RBAC, for example)windows.desktop.session.end
(TDP01I
) emitted on enddesktop.clipboard.send
(TDP02I
) emitted for local copy -> remote pastedesktop.clipboard.receive
(TDP03I
) emitted for remote copy -> local pasteBinaries compatibility @fheinecke
Machine ID @timothyb89
SSH
With a default Teleport instance configured with a SSH node:
tctl bots add robot --roles=access
. Follow the instructions provided in the output to starttbot
ssh_config
in the destination directorySIGUSR1
andSIGHUP
to a running tbot process causes a renewal and new certificates to be generatedEnsure the above tests are completed for both:
DB Access
With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:
tbot db
whiletbot start
is runningTeleport Connect @ravicious @gzdunek @avatus
auth_service.authentication
in the cluster config):type: local
,second_factor: "off"
type: local
,second_factor: "otp"
type: local
,second_factor: "webauthn"
type: local
,second_factor: "optional"
, log in without MFAtype: local
,second_factor: "optional"
, log in with OTPtype: local
,second_factor: "optional"
, log in with hardware keytype: local
,second_factor: "on"
, log in with OTPtype: local
,second_factor: "on"
, log in with hardware keyTELEPORT_PROXY
andTELEPORT_CLUSTER
should pin the session to the correct cluster.TELEPORT_HOME
should point to~/Library/Application Support/Teleport Connect/tsh
.PATH
should include/Applications/Teleport Connect.app/Contents/Resources/bin
.$ sudo apt-get install mc
$ mc
$ exit
command.~/Library/Application Support/Teleport Connect/tsh
doesn't crash the app.~/Library/Application Support/Teleport Connect/app_state.json
but not thetsh
dir doesn't crash the app.spec.allow.logins
andspec.allow.db_users
.tsh proxy db
with the same port, start the app. Verify that the app doesn't crash and the db connection tab shows you the error (address in use) and offers a way to retry creating the connection.Cmd+[1...9]
.1m
(spec.options.max_session_ttl
).select now();
, the client should be able to automatically reinstantiate the connection.~/Library/Application\ Support/Teleport\ Connect/logs
. @raviciousHost users creation @jakule
Host users creation docs Host users creation RFD
teleport-system
groupdisable_create_host_user: true
stops user creation from occurringCA rotations @espadolini
tctl get cert_authority
)standby
phase: onlyactive_keys
, noadditional_trusted_keys
init
phase:active_keys
andadditional_trusted_keys
update_clients
andupdate_servers
phases: the certs from theinit
phase are swappedstandby
phase: only the new certs remain inactive_keys
, nothing inadditional_trusted_keys
rollback
phase (second pass, after completing a regular rotation): same content as in theinit
phasestandby
phase afterrollback
: same content as in the previousstandby
phasetsh app login
kubectl get po
aftertsh kube login
IP-based validation
SSH @probakowski
pin_source_ip: true
option can be added in role definitiontsh ssh
works when invoked from the same machine/IP that was used for logging intsh ssh
prompts for relogin when invoked from different machine (copy certs after login)sshd
server works as above in both casesssh
works as above in both casestsh status -d
shows pinned IP