gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.23k stars 1.73k forks source link

Teleport 13 Test Plan #24576

Closed r0mant closed 1 year ago

r0mant commented 1 year ago

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh installation of the version to be released as well as an upgrade of the previous version of Teleport.

User accounting @atburke

Combinations @strideynet

For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.

Teleport with EKS/GKE @tigrato

Teleport with multiple Kubernetes clusters @AntonAM

Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

Kubernetes auto-discovery @tigrato

Kubernetes Secret Storage @AntonAM

Kubernetes Pod RBAC @AntonAM

Kubernetes credentials forwarding @tigrato

Teleport with FIPS mode @atburke

ACME @marcoandredinis

Migrations @r0mant @zmb3

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p@foo.com" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers

GitHub External SSO @Tener

tctl sso family of commands @Tener

For help with setting up sso connectors, check out the Quick GitHub/SAML/OIDC Setup Tips

tctl sso configure helps to construct a valid connector definition:

tctl sso test test a provided connector definition, which can be loaded from file or piped in with tctl sso configure or tctl get --with-secrets. Valid connectors are accepted, invalid are rejected with sensible error messages.

Teleport Plugins @EdwardDowling

AWS Node Joining @nklaassen

Docs

Kubernetes Node Joining @hugoShaka

Azure Node Joining @atburke

Docs

Cloud Labels @atburke

Passwordless @codingllama

This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg: https://get.gravitational.com/teleport-v10.0.0-alpha.2-linux-amd64-bin.tar.gz).

This sections complements "Users -> Managing MFA devices". tsh binaries for each operating system (Linux, macOS and Windows) must be tested separately for FIDO2 items.

Device Trust @sshahcodes

Device Trust requires Teleport Enterprise.

This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg: https://get.gravitational.com/teleport-v10.0.0-alpha.2-linux-amd64-bin.tar.gz).

Client-side enrollment requires a signed tsh for macOS, make sure to use the tsh binary from tsh.app.

A simple formula for testing device authorization is:

# Before enrollment.
# Replace with other kinds of access, as appropriate (db, kube, etc)
tsh ssh node-that-requires-device-trust
> ERROR: ssh: rejected: administratively prohibited (unauthorized device)

# Register the device.
# Get the serial number from "Apple -> About This Mac".
tctl devices add --os=macos --asset-tag=<SERIAL_NUMBER> --enroll

# Enroll the device.
tsh device enroll --token=<TOKEN_FROM_COMMAND_ABOVE>
tsh logout; tsh login

# After enrollment
tsh ssh node-that-requires-device-trust
> $

Hardware Key Support @Joerger

Hardware Key Support is an Enterprise feature and is not available for OSS.

You will need a YubiKey 4.3+ to test this feature.

This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg: https://get.gravitational.com/teleport-ent-v11.0.0-alpha.2-linux-amd64-bin.tar.gz).

Server Access

These tests should be carried out sequentially. tsh tests should be carried out on Linux, MacOS, and Windows.

  1. [x] tsh login as user with Webauthn login and no hardware key requirement.
  2. [x] Request a role with role.role_options.require_session_mfa: hardware_key - tsh login --request-roles=hardware_key_required
    • [x] Assuming the role should force automatic re-login with yubikey
    • [x] tsh ssh
    • [x] Requires yubikey to be connected for re-login
    • [x] Prompts for per-session MFA
  3. [x] Request a role with role.role_options.require_session_mfa: hardware_key_touch - tsh login --request-roles=hardware_key_touch_required
    • [x] Assuming the role should force automatic re-login with yubikey
    • [x] Prompts for touch if not cached (last touch within 15 seconds)
    • [x] tsh ssh
    • [x] Requires yubikey to be connected for re-login
    • [x] Prompts for touch if not cached
  4. [x] tsh logout and tsh login as the user with no hardware key requirement.
  5. [x] Upgrade auth settings to auth_service.authentication.require_session_mfa: hardware_key
    • [x] Using the existing login session (tsh ls) should force automatic re-login with yubikey
    • [x] tsh ssh
    • [x] Requires yubikey to be connected for re-login
    • [x] Prompts for per-session MFA
  6. [x] Upgrade auth settings to auth_service.authentication.require_session_mfa: hardware_key_touch
    • [x] Using the existing login session (tsh ls) should force automatic re-login with yubikey
    • [x] Prompts for touch if not cached
    • [x] tsh ssh
    • [x] Requires yubikey to be connected for re-login
    • [x] Prompts for touch if not cached

Other

Set auth_service.authentication.require_session_mfa: hardware_key_touch in your cluster auth settings.

HSM Support @nklaassen

Docs

Moderated session @marcoandredinis

Using tsh join an SSH session as two moderators (two separate terminals, role requires one moderator).

Using tsh join an SSH session as two moderators (two separate terminals, role requires one moderator).

Performance @rosstimothy @fspmarshall @espadolini

Scaling Test

Scale up the number of nodes/clusters a few times for each configuration below.

1) Verify that there are no memory/goroutine/file descriptor leaks 2) Compare the baseline metrics with the previous release to determine if resource usage has increased 3) Restart all Auth instances and verify that all nodes/clusters reconnect

Perform reverse tunnel node scaling tests for all backend configurations:

Soak Test

Run 30 minute soak test directly against direct and tunnel nodes and via label based matching. Tests should be run against a Cloud tenant.

tsh bench ssh --duration=30m user@direct-dial-node ls
tsh bench ssh --duration=30m user@reverse-tunnel-node ls
tsh bench ssh --duration=30m user@foo=bar ls
tsh bench ssh --duration=30m --random user@foo ls

Concurrent Session Test

Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:

tsh bench web sessions --max=5000 user ls
tsh bench web sessions --max=5000 --web user ls

Robustness

Teleport with Cloud Providers

AWS @tcsc

GCP @tcsc

IBM @hugoShaka

Application Access @mdwn

Database Access @smallinsky

TLS Routing @smallinsky

Binaries compatibility @fheinecke

Machine ID

SSH @strideynet

With a default Teleport instance configured with a SSH node:

Ensure the above tests are completed for both:

DB Access @timothyb89

With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:

Host users creation @lxea

Host users creation docs Host users creation RFD

CA rotations @espadolini

EC2 Discovery @lxea

EC2 Discovery docs

IP Pinning

Add a role with pin_source_ip: true (requires Enterprise) to test IP pinning. Testing will require changing your IP (that Teleport Proxy sees). Docs: IP Pinning

Documentation @ptgott @alexfornuto

Checks should be performed on the version of documentation corresponding to the major release we're testing for. For example, for Teleport 12 release use branch/v12 branch and make sure to select "Version 12.0" in the documentation version switcher.

Resources

Quick GitHub/SAML/OIDC Setup Tips

hugoShaka commented 1 year ago

Creating a Kubernetes join token returns no message: https://github.com/gravitational/teleport/issues/24733

nklaassen commented 1 year ago

~Web UI has no favicon: https://github.com/gravitational/teleport/issues/24773~

Fixed

nklaassen commented 1 year ago

Agentless/OpenSSH guide doesn't work: https://github.com/gravitational/teleport/issues/24778

Tener commented 1 year ago

~public_addr no longer accepts https://: https://github.com/gravitational/teleport/issues/24796~

Fixed

Joerger commented 1 year ago

~Hardware Key support bug: https://github.com/gravitational/teleport/issues/24866~

And the fix: https://github.com/gravitational/teleport/pull/24867

smallinsky commented 1 year ago

Minor UX log entry issue for tsh db connect https://github.com/gravitational/teleport/issues/24879 cc: @GavinFrazar

nklaassen commented 1 year ago

~Can't SSH to agentless nodes from Web UI: https://github.com/gravitational/teleport/issues/24922~

Fixed

Joerger commented 1 year ago

Enhanced session recording does not capture disk events - looks like a known issue so I'm checking it as complete.

nklaassen commented 1 year ago

tsh proxy ssh tries to prompt for password on invalid login when stdin is not a terminal: https://github.com/gravitational/teleport/issues/24925

capnspacehook commented 1 year ago

~Forwarding SSH agent with OpenSSH to agentless node hangs on exit: https://github.com/gravitational/teleport/issues/24936~

Fixed

rosstimothy commented 1 year ago

~Proxy is unable to join the cluster when using the default Kube join mechanism in the Helm chart: https://github.com/gravitational/teleport/issues/24941~

Fixed

nklaassen commented 1 year ago

~tsh attempts relogin for "ambiguous host" errors: https://github.com/gravitational/teleport/issues/24943~

Fixed

nklaassen commented 1 year ago

scp to agentless nodes allowed in spite of RBAC denial: #24949

Fixed

nklaassen commented 1 year ago

YubiHSM2 SDK version 2023.01 not supported: #25017

GavinFrazar commented 1 year ago

~Trusted cluster OpenSSH tsh config incorrect config generation: #25018~

This is working as expected.

Tener commented 1 year ago

DynamoDB db access requires additional configuration which isn't mentioned in docs or handled by tsh: https://github.com/gravitational/teleport/issues/25063

strideynet commented 1 year ago

Can't openssh or Web SSH from root cluster to Agentless in leaf cluster (this request can be only executed by a proxy): https://github.com/gravitational/teleport/issues/25068

Fixed

strideynet commented 1 year ago

~Web SSH connections to an Agentless node do not show the node name in the session recordings list: https://github.com/gravitational/teleport/issues/25072~

Fixed

strideynet commented 1 year ago

Role impersonated certificates do not work with Agentless SSH proxy re-issuing https://github.com/gravitational/teleport/issues/25083

Fixed

GavinFrazar commented 1 year ago

agentless OpenSSH guide does not explain required permissions to create node resources: https://github.com/gravitational/teleport/issues/25129

AntonAM commented 1 year ago

~Proxy can't connect to the Auth when installing Teleport with helm chart #25149~

Fixed

ibeckermayer commented 1 year ago

Several UI bugs in Discover for Desktop Access

Joerger commented 1 year ago

~tsh ssh -J leaf.proxy.example.com leaf-node only works when root auth/proxy is shut down - https://github.com/gravitational/teleport/issues/25178~

Fixed

ptgott commented 1 year ago

Differences in docs pages between master and v13, including the git commits on master that aren't present in v13 for each page (I can't just use git log here because backport refs aren't identical to their source refs):

$ git diff --name-only origin/master origin/branch/v13 -- docs/pages | xargs -I{} bash -c '
git log --oneline origin/branch/v13..origin/master -- {}
'
59ebccb538 docs: Login Rule k8s operator docs (#23888)
59ebccb538 docs: Login Rule k8s operator docs (#23888)
bb1f9899c1 Alphabetize the GUI Client page (#25013)
3d17be5a1d docs: add information on viewing status and logs for systemd service (#25139)
59ebccb538 docs: Login Rule k8s operator docs (#23888)

Looks like these all have outstanding backports, so I think all is good.

ptgott commented 1 year ago

@alexfornuto @avatus What do you think is the best way right now to ensure that the "Upcoming Releases" page only exists for the default docs version?

We could add a redirect from /preview/upcoming-releases/ to https://goteleport.com/docs/preview/upcoming-releases in the non-default branches, but that will quickly become difficult to maintain (unless we change the test plan to be really specific about what we need to change with each release).

Another option is deleting this page for non-default versions and not adding a redirect, but that might lead to 404s.

rosstimothy commented 1 year ago

etcd 10k Test

etcd Metrics

Screenshot 2023-04-25 at 6 32 03 PM

Teleport Metrics

Screenshot 2023-04-25 at 6 32 35 PM

Network Metrics

Screenshot 2023-04-25 at 6 33 50 PM

Soak Test

> tsh bench --duration=30m ssh root@node-agents-6dcccfd8df-22rfr-01 ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         195 ms
50         227 ms
75         284 ms
90         354 ms
95         403 ms
99         548 ms
100        1353 ms

> tsh bench --duration=30m ssh --random root@all ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         229 ms
50         258 ms
75         301 ms
90         354 ms
95         397 ms
99         533 ms
100        2475 ms

> tsh bench --duration=30m ssh root@foo=bar ls

* Requests originated: 17982
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         719 ms
50         1294 ms
75         4591 ms
90         18735 ms
95         24047 ms
99         28415 ms
100        35935 ms

Firestore 10k Test

Teleport Metrics

Screenshot 2023-04-25 at 6 34 54 PM

Network Metrics

Screenshot 2023-04-25 at 6 35 22 PM

Issues

ptgott commented 1 year ago

I've suggested some edits to the teleport-cluster Helm guide in the course of testing it: https://github.com/gravitational/teleport/pull/25287

ptgott commented 1 year ago

In the Teleport Enterprise Cloud Getting Started guide, v13 has some UI differences from v12 for adding servers (including light mode), but I'm not going to update the screenshots this week since I have some higher-priority items to take care of. The overall server registration flow shown in the guide still works as intended.

hugoShaka commented 1 year ago

Outdated OneLogin screenshot: https://github.com/gravitational/teleport/pull/25290

hugoShaka commented 1 year ago

Bucket ACL issues with terraform: https://github.com/gravitational/teleport/pull/25113

flyinghermit commented 1 year ago

~tsh ssh returns ambiguous EOF error when devices are locked : https://github.com/gravitational/teleport.e/issues/1240~

Fixed

rosstimothy commented 1 year ago

~tsh ssh and tsh ls not working when cluster is upgraded to alpha.2: https://github.com/gravitational/teleport/issues/25365~

Fixed

capnspacehook commented 1 year ago

Using OpenSSH ssh to connect to leaf agentless nodes results in hostkey warning: https://github.com/gravitational/teleport/issues/25511

capnspacehook commented 1 year ago

joining agentless moderated sessions doesn't work: https://github.com/gravitational/teleport/issues/25522

Working as expected

capnspacehook commented 1 year ago

creating moderated sessions for a leaf node is not enforced: https://github.com/gravitational/teleport/issues/25557

fspmarshall commented 1 year ago

Dynamo Loadtesting

10k Tunnel Node Scaling

10k-tunnel-scaling-04-27

10k Direct Dial Node Scaling

10k-direct-scaling-04-27

500 Trusted Cluster Scaling

500tc-scaling-04-27

note: Elevated CPU is presumed to be due to a cache bug that was causing frequent recents of the "remote proxy" cache, and will be fixed in the final v13 release.

Benchmarks (1k Nodes)

Tunnel bench:

tsh bench ssh --duration=30m root@node-agents-77ff5cb7c7-zspkf-19 ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         118 ms                                                                                                                 
50         127 ms
75         136 ms           
90         140 ms   
95         143 ms
99         155 ms
100        6839 ms

Tunnel Random:

tsh bench ssh --duration=30m --random root@all ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         123 ms
50         130 ms
75         140 ms
90         150 ms
95         159 ms
99         187 ms
100        6823 ms

Label-Based:

tsh bench ssh --duration=30m root@fullname=node-agents-77ff5cb7c7-zxxg4-19 ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         164 ms
50         172 ms
75         179 ms           
90         187 ms           
95         192 ms
99         209 ms
100        6859 ms

Direct Dial:

tsh bench ssh --duration=30m root@node-agents-77ff5cb7c7-zsw4p-19 ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         108 ms
50         113 ms
75         116 ms
90         126 ms           
95         133 ms           
99         145 ms
100        6803 ms