Closed r0mant closed 2 years ago
I ran into some issues with the new config v3 changes: https://github.com/gravitational/teleport/issues/17118
So far I am unable to ssh to an OpenSSH node using tsh
or the Web UI.
On the ssh node, I see userauth_pubkey: unsupported public key algorithm: rsa-sha2-256-cert-v01@openssh.com [preauth]
tsh
and the Web UI show ERROR: access denied to ec2-user connecting to <ip> on cluster <my cluster>
.
The connection works using the OpenSSH client connecting through the teleport proxy
The SSH node is an ec2 instance running the latest amazon linux 2, sshd version is OpenSSH_7.4p1
edit: I get the same error running Teleport v10.0.0
edit 2: with a newer sshd, tsh
begins to work but the openssh
client stops working. Filed an issue with details: https://github.com/gravitational/teleport/issues/17197
tsh ssh host command
spams an auditd error for regular or remote nodes running in docker: https://github.com/gravitational/teleport/issues/17185
tsh play
seems to have a default API domain of teleport.cluster.local
when attempting to play a remote recording: https://github.com/gravitational/teleport/issues/17192
teleport app start
outputs the wrong flags during a misconfiguration: https://github.com/gravitational/teleport/issues/17264
teleport configure
for app_servers produces invalid/deprecated YAML: https://github.com/gravitational/teleport/issues/17268
tctl create
with no arguments blocks forever: https://github.com/gravitational/teleport/issues/17271
tsh proxy ssh -J <leaf-proxy>
doesn't work with root shut down - https://github.com/gravitational/teleport/issues/17184
Desktop Access clipboard sharing is broken -- https://github.com/gravitational/teleport/issues/17195
Enhanced recording, aka BPF, seems to be broken on v11.
v10 leaf clusters are mostly unusable from v11 roots: #17211
https://teleportcoreteam.grafana.net/goto/c6BFvMI4z?orgId=1
https://teleportcoreteam.grafana.net/goto/SX6JDGI4z?orgId=1
https://teleportcoreteam.grafana.net/goto/tuTUDGIVz?orgId=1
----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-d8mlt ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 173 ms
90 181 ms
95 189 ms
99 211 ms
100 484 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-d8mlt ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 163 ms
50 168 ms
75 174 ms
90 181 ms
95 189 ms
99 208 ms
100 434 ms
----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-999nx ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 164 ms
50 169 ms
75 174 ms
90 181 ms
95 186 ms
99 203 ms
100 404 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-999nx ps aux
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 164 ms
50 170 ms
75 175 ms
90 181 ms
95 187 ms
99 208 ms
100 456 ms
https://teleportcoreteam.grafana.net/goto/XXiMOGIVk?orgId=1
https://teleportcoreteam.grafana.net/goto/CKcndGI4z?orgId=1
https://teleportcoreteam.grafana.net/goto/34V4OGSVk?orgId=1
----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-vtkdv ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 157 ms
50 162 ms
75 167 ms
90 173 ms
95 178 ms
99 200 ms
100 427 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-vtkdv ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 158 ms
50 162 ms
75 167 ms
90 172 ms
95 176 ms
99 198 ms
100 425 ms
----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-tgdc8 ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 173 ms
90 179 ms
95 185 ms
99 204 ms
100 438 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-tgdc8 ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 174 ms
90 181 ms
95 188 ms
99 208 ms
100 336 ms
$ tsh bench --duration=30m <user>@<host> ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 171 ms
50 179 ms
75 188 ms
90 197 ms
95 205 ms
99 259 ms
100 1845 ms
$ tsh bench --duration=30m --interactive <user>@<host> ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 177 ms
50 186 ms
75 194 ms
90 205 ms
95 215 ms
99 306 ms
100 2251 ms
$ tsh bench --duration=30m <user>@<host> ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 155 ms
50 161 ms
75 179 ms
90 184 ms
95 188 ms
99 214 ms
100 1186 ms
$ tsh bench --duration=30m --interactive <user>@<host> ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 155 ms
50 161 ms
75 179 ms
90 184 ms
95 188 ms
99 211 ms
100 469 ms
In addition to normal scaling tests, I did a step by step upgrade of a 10K node dynamo cluster in order to asses the dynamoDB usage differences between v10.2
and v11.0.0-alpha.2
. This was done in order to assess the effects of https://github.com/gravitational/teleport/pull/16911 on dynamoDB read capacity.
Below are two dynamo DB stat page images. The first shows a v10.2
cluster being restarted, and the second shows the same restart procedure being used to apply an upgrade from v10.2
to v11.0.0-alpha.2
(we use a non-upgrading restart as the comparison point since it helps us control for the load created by cache resets and disruption of heartbeats):
Note the difference in the "read usage" sections between the restart and upgrade cases. Both have a similar large spike immediately after restart due to cache resets, with the upgrade case stabilizing at a much higher average read usage (~29 vs ~1.5). In theory, a read usage of 29 for a 10k cluster is practically nothing, but the proportional difference between the resting rate before and after https://github.com/gravitational/teleport/pull/16911 does make me nervous. Such a jump might negatively impact users with very high numbers of peak concurrent sessions if they have fine-tuned their dynamo read capacity to just barely accommodate their existing load. We don't recommend doing things like that, and we generally encourage people to use on-demand, but it still gives me pause. Haven't made up my mind yet, but I think I might revert the compare-and-swap semantics introduced in https://github.com/gravitational/teleport/pull/16911 in favor of an approach that has a lower impact.
Small issue with Snowflake DB Access: tctl auth sign
call on leaf cluster in case of multi trusted clusters setup: https://github.com/gravitational/teleport/issues/17262
PR with a fix https://github.com/gravitational/teleport/pull/17263
Opted to revert compare-and-swap node heartbeats based on dynamo stats in https://github.com/gravitational/teleport/issues/16951#issuecomment-1273939513.
PR with fix: https://github.com/gravitational/teleport/pull/17308
Can we please add X11 tests as a non-root user to this (and future) test plans? Thanks!
Desktop Access clipboard sharing is broken -- #17195
Webapps PR with the fix is here https://github.com/gravitational/webapps/pull/1250 ~Ideally https://github.com/gravitational/webapps/pull/1251 gets merged and backported as well~
Update: resolved
Hardware key support broke between v11.0.0-alpha.2
and v11.0.0-beta.1
- https://github.com/gravitational/teleport/issues/17415
Edit: False alarm, it only doesn't work in proxy recording mode as expected... I've added the Hardware Key Support tests to the test plan to double check everything with v11.0.0-beta.1
.
/var/log/wtmp
is not being updated correctly https://github.com/gravitational/teleport/pull/17416
Teleport Kube Agent Chart hook is failing due to a wrong find & replace #17437
Onelogin SSO integration guide still works but a couple of screenshots and concepts would need an update: https://github.com/gravitational/teleport/issues/17485
tsh
/ Windows: tsh mfa add
for OTPs doesn't show me the QR code. (Typing the key still works.)
FYI @tobiaszheller
Raised https://github.com/gravitational/teleport/issues/17563 and https://github.com/gravitational/teleport/issues/17564, neither is blocking for the release.
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.
[x] Adding nodes to a cluster @EdwardDowling
[x] Labels @EdwardDowling
[x] Trusted Clusters @lxea
[x] RBAC @atburke
Make sure that invalid and valid attempts are reflected in audit log.
[x] Verify that custom PAM environment variables are available as expected. @jakule
[x] Users @codingllama
With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
WebAuthn in the release
tsh
binary is implemented using libfido2 for linux/macOS. Ask for a statically built pre-release binary for realistic tests. (tsh fido2 diag
should work in our binary.) Webauthn in Windows build is implemented usingwebauthn.dll
. (tsh webauthn diag
with security key selected in dialog should work.)Touch ID requires a signed
tsh
, ask for a signed pre-release binary so you may run the tests.Windows Webauthn requires Windows 10 19H1 and device capable of Windows Hello.
tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedtsh mfa add
U2F devices must be registered in a previous version of Teleport.
Using Teleport v9, set
auth_service.authentication.second_factor = u2f
, restart the server and then register an U2F device (tsh mfa add
). Upgrade the install to the current Teleport version (one major at a time) and try to login using the U2F device as your second factor - it should work.[x] SSO @camscale
[x] Backends @Joerger
[x] Session Recording @strideynet
[x] Audit Log @capnspacehook
Node/Proxy ID may be found at
/var/lib/teleport/host_uuid
in the corresponding machine.Node IDs may also be queried via
tctl nodes ls
.scp
commands are recordedSubsystem testing may be achieved using both Recording Proxy mode and OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node running OpenSSH/sshd, you may use the following command to trigger a subsystem audit log:[x] Interact with a cluster using
tsh
@mdwnThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
[x] Interact with a cluster using
ssh
@tobiaszheller Make sure to test both recording and regular proxy modes.[x] Verify proxy jump functionality @Joerger Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.
[x] Interact with a cluster using the Web UI @capnspacehook
User accounting @jakule
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @nklaassen
For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @tigrato
Teleport with multiple Kubernetes clusters @AntonAM
Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columnKubernetes auto-discovery @tigrato
Kubernetes Secret Storage @tigrato
Statefulset
Statefulset
resource and if it contains the new ENV variablesDeployment
was correctly converted into a Statefulset and if the oldDeployment
object was removed after a successful upgradeTeleport with FIPS mode @alistanis
ACME @alistanis
Migrations @jakule
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers
tctl sso
family of commands @TenerFor help with setting up sso connectors, check out the Quick GitHub/SAML/OIDC Setup Tips
tctl sso configure
helps to construct a valid connector definition:tctl sso configure github ...
creates valid connector definitionstctl sso configure oidc ...
creates valid connector definitionstctl sso configure saml ...
creates valid connector definitionstctl sso test
test a provided connector definition, which can be loaded from file or piped in withtctl sso configure
ortctl get --with-secrets
. Valid connectors are accepted, invalid are rejected with sensible error messages.tctl sso test
.Teleport Plugins @hugoShaka
AWS Node Joining @nklaassen
Docs
ec2:DescribeInstances
permissions for local account:TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
Passwordless @codingllama
Passwordless requires
tsh
compiled with libfido2 for most operations (apart from Touch ID). Ask for a statically-builttsh
binary for realistic tests.Touch ID requires a properly built and signed
tsh
binary. Ask for a pre-release binary, so you may run the tests.This sections complements "Users -> Managing MFA devices".
tsh
binaries for each operating system (Linux, macOS and Windows) must be tested separately for FIDO2 items.[x] Diagnostics
Commands should pass all tests.
tsh fido2 diag
(macOS/Linux)tsh touchid diag
(macOS only)tsh webauthnwin diag
(Windows only)[x] Registration
tsh mfa add
, choose WEBAUTHN and passwordless)tsh mfa add
, choose TOUCHID)tsh mfa add
, choose WEBAUTHN and passwordless)[x] Login
tsh login --auth=passwordless
)tsh login --auth=passwordless
)tsh login --auth=passwordless --mfa-mode=cross-platform
uses FIDO2tsh login --auth=passwordless --mfa-mode=platform
uses platform authenticatortsh login --auth=passwordless --mfa-mode=auto
prefers platform authenticatorauth_service.authentication.passwordless = false
)auth_service.authentication.connector_name = passwordless
)tsh login --auth=local
)[x] Touch ID support commands
tsh touchid ls
workstsh touchid rm
works (careful, may lock you out!)Hardware Key Support @Joerger
Hardware Key Support is an Enterprise feature and is not available for OSS.
You will need a YubiKey 4.3+ to test this feature.
This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg:
https://get.gravitational.com/teleport-ent-v11.0.0-alpha.2-linux-amd64-bin.tar.gz
).These tests should be carried out sequentially.
tsh
tests should be carried out on Linux, MacOS, and Windows.tsh login
as user with Webauthn login and no hardware key requirement.role.role_options.require_session_mfa: hardware_key
-tsh login --request-roles=hardware_key_required
tsh ssh
role.role_options.require_session_mfa: hardware_key_touch
-tsh login --request-roles=hardware_key_touch_required
tsh ssh
tsh logout
andtsh login
as the user with no hardware key requirement.auth_service.authentication.require_session_mfa: hardware_key
tsh ls
) should force automatic re-login with yubikeytsh ssh
auth_service.authentication.require_session_mfa: hardware_key_touch
tsh ls
) should force automatic re-login with yubikeytsh ssh
Performance @rosstimothy @fspmarshall
Perform all tests on the following configurations:
[x] With default networking configuration
[x] With Proxy Peering Enabled
[x] With TLS Routing Enabled
Cluster with 10K direct dial nodes:
Cluster with 10K reverse tunnel nodes:
Cluster with 500 trusted clusters:
[x] etcd
[x] DynamoDB
[ ] Firestore
Soak Test
Run 30 minute soak test with a mix of interactive/non-interactive sessions for both direct and reverse tunnel nodes:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Concurrent Session Test
Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:
Teleport with Cloud Providers
AWS @hugoShaka
GCP @AntonAM
IBM @atburke
Application Access @mdwn
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
.tsh aws
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
dialogue works (refresh app screen to see it registered)Database Access @smallinsky + db access team
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
. @gabrielcoradodb_users
. @gabrielcoradodb_names
. @gabrielcoradodb.session.start
is emitted when connection attempt is denied.db_names
. @gabrielcoradodb.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
TLS Routing @smallinsky
v2
configuration starts only a single listener. @smallinskymultiplex
modeauth_service.proxy_listener_mode: "multiplex"
@smallinskyweb_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
modetsh proxy db
with a GUI client. @smallinsky @GavinFrazar @greedy52 @Tener @gabrielcoradomultiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" user@host.example.com
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" user@node.foo.com
tsh ssh
access through proxy running in multiplex modemultiplex
modeDesktop Access @ibeckermayer @probakowski @LKozlowski
listen_addr
):hosts
section.hosts
section.windows_desktop_service
s to the same Teleport cluster, verify that connections to desktops on different AD domains works. (Attempt to connect several times to verify that you are routed to the correctwindows_desktop_service
)client_idle_timeout
to a small value and verify that idle sessions are terminated (the session should end and an audit event will confirm it was due to idle connection)teleport.dev/origin
label.teleport.dev
labels for OS, OS Version, DNS hostname.desktop_directory_sharing: false
) and confirm that the option to share a directory doesn't appear in the menumode: node-sync
ormode: proy-sync
)mode: node
ormode: proxy
)windows.desktop.session.start
(TDP00I
) emitted on startwindows.desktop.session.start
(TDP00W
) emitted when session fails to start (due to RBAC, for example)windows.desktop.session.end
(TDP01I
) emitted on enddesktop.clipboard.send
(TDP02I
) emitted for local copy -> remote pastedesktop.clipboard.receive
(TDP03I
) emitted for remote copy -> local pasteBinaries compatibility @fheinecke
Machine ID @timothyb89
SSH
With a default Teleport instance configured with a SSH node:
tctl bots add robot --roles=access
. Follow the instructions provided in the output to starttbot
ssh_config
in the destination directorySIGUSR1
andSIGHUP
to a running tbot process causes a renewal and new certificates to be generatedssh_config
provided bytbot
after each phase of a manual CA rotation.Ensure the above tests are completed for both:
DB Access
With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:
tbot db
whiletbot start
is runningHost users creation @lxea
Host users creation docs Host users creation RFD
teleport-system
groupdisable_create_host_user: true
stops user creation from occurringCA rotations @espadolini
tctl get cert_authority
)standby
phase: onlyactive_keys
, noadditional_trusted_keys
init
phase:active_keys
andadditional_trusted_keys
update_clients
andupdate_servers
phases: the certs from theinit
phase are swappedstandby
phase: only the new certs remain inactive_keys
, nothing inadditional_trusted_keys
rollback
phase (second pass, after completing a regular rotation): same content as in theinit
phasestandby
phase afterrollback
: same content as in the previousstandby
phasetsh app login
kubectl get po
aftertsh kube login
EC2 Discovery @lxea
EC2 Discovery docs
Resources
Quick GitHub/SAML/OIDC Setup Tips