r0mant commented 2 years ago

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.

[x] Adding nodes to a cluster @codingllama
- [x] Adding Nodes via Valid Static Token
- [x] Adding Nodes via Valid Short-lived Tokens
- [x] Adding Nodes via Invalid Token Fails
- [x] Revoking Node Invitation
[x] Labels @nklaassen
- [x] Static Labels
- [x] Dynamic Labels
[x] Trusted Clusters @espadolini
- [x] Adding Trusted Cluster Valid Static Token
- [x] Adding Trusted Cluster Valid Short-lived Token
- [x] Adding Trusted Cluster Invalid Token
- [x] Removing Trusted Cluster
[ ] RBAC @timothyb89

Make sure that invalid and valid attempts are reflected in audit log.
- [x] Successfully connect to node with correct role
- [x] Unsuccessfully connect to a node in a role restricting access by label
- [x] Unsuccessfully connect to a node in a role restricting access by invalid SSH login
- [ ] Allow/deny role option: SSH agent forwarding
- [x] Allow/deny role option: Port forwarding
[x] Verify that custom PAM environment variables are available as expected. @xacrimon
[x] Users @codingllama
With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
- [x] Adding Users Password Only
- [x] Adding Users OTP
- [x] Adding Users U2F
- [x] Adding Users WebAuthn
- [x] Managing MFA devices
- [x] Add an OTP device with tsh mfa add
- [x] Add a U2F device with tsh mfa add
- [x] Verify that the U2F device works under WebAuthn
- [x] Add a WebAuthn device with tsh mfa add
- [x] List MFA devices with tsh mfa ls
- [x] Remove an OTP device with tsh mfa rm
- [x] Remove a U2F device with tsh mfa rm
- [x] Remove a WebAuthn device with tsh mfa rm
- [x] Attempt removing the last MFA device on the user
  - [x] with second_factor: on in auth_service, should fail
  - [x] with second_factor: optional in auth_service, should succeed
- [x] Login Password Only
- [x] Login with MFA
- [x] Add 2 OTP and 2 WebAuthn devices with tsh mfa add
- [x] Login via OTP
- [x] Login via WebAuthn
- [x] Login OIDC
- [x] Login SAML
- [x] Login GitHub
- [x] Deleting Users
[x] Backends @smallinsky
- [x] Teleport runs with etcd
- [x] Teleport runs with dynamodb
- [x] Teleport runs with SQLite
- [x] Teleport runs with Firestore
[x] Session Recording @Tener
- [x] Session recording can be disabled
- [x] Sessions can be recorded at the node
- [x] Sessions in remote clusters are recorded in remote clusters
- [x] Sessions can be recorded at the proxy
- [x] Sessions on remote clusters are recorded in the local cluster
- [x] Enable/disable host key checking.
[x] Audit Log @greedy52
- [x] Failed login attempts are recorded
- [x] Interactive sessions have the correct Server ID
- [x] Server ID is the ID of the node in "session_recording: node" mode
- [x] Server ID is the ID of the proxy in "session_recording: proxy" mode
Node/Proxy ID may be found at /var/lib/teleport/host_uuid in the corresponding machine.

Node IDs may also be queried via tctl nodes ls.
- [x] Exec commands are recorded
- [x] scp commands are recorded
- [x] Subsystem results are recorded
Subsystem testing may be achieved using both Recording Proxy mode and OpenSSH integration.

Assuming the proxy is proxy.example.com:3023 and node1 is a node running OpenSSH/sshd, you may use the following command to trigger a subsystem audit log:
```
sftp -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" root@node1
```
[x] Interact with a cluster using tsh @hatched

These commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
- [x] tsh ssh \<regular-node>
- [x] tsh ssh \<node-remote-cluster>
- [x] tsh ssh -A \<regular-node>
- [x] tsh ssh -A \<node-remote-cluster>
- [x] tsh ssh \<regular-node> ls
- [x] tsh ssh \<node-remote-cluster> ls
- [x] tsh join \<regular-node>
- [x] tsh join \<node-remote-cluster>
- [x] tsh play \<regular-node>
- [x] tsh play \<node-remote-cluster>
- [x] tsh scp \<regular-node>
- [x] tsh scp \<node-remote-cluster>
- [x] tsh ssh -L \<regular-node>
- [x] tsh ssh -L \<node-remote-cluster>
- [x] tsh ls
- [x] tsh clusters
[x] Interact with a cluster using ssh @fheinecke Make sure to test both recording and regular proxy modes.
- [x] ssh \<regular-node>
- [x] ssh \<node-remote-cluster>
- [x] ssh -A \<regular-node>
- [x] ssh -A \<node-remote-cluster>
- [x] ssh \<regular-node> ls
- [x] ssh \<node-remote-cluster> ls
- [x] scp \<regular-node>
- [x] scp \<node-remote-cluster>
- [x] ssh -L \<regular-node>
- [x] ssh -L \<node-remote-cluster>
[x] Interact with a cluster using the Web UI @gzdunek
- [x] Connect to a Teleport node
- [x] Connect to a OpenSSH node
- [x] Check agent forwarding is correct based on role and proxy mode. (Tested the same way as here)

User accounting @xacrimon

[x] Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
[x] Verify that interactive sessions are logged in /var/log/wtmp on Linux.

Combinations @atburke

For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.

[x] Connect to a OpenSSH node in a local cluster using OpenSSH.
[x] Connect to a OpenSSH node in a local cluster using Teleport.
[x] Connect to a OpenSSH node in a local cluster using the Web UI.
[x] Connect to a Teleport node in a local cluster using OpenSSH.
[x] Connect to a Teleport node in a local cluster using Teleport.
[x] Connect to a Teleport node in a local cluster using the Web UI.
[x] Connect to a OpenSSH node in a remote cluster using OpenSSH.
[x] Connect to a OpenSSH node in a remote cluster using Teleport.
[x] Connect to a OpenSSH node in a remote cluster using the Web UI.
[x] Connect to a Teleport node in a remote cluster using OpenSSH.
[x] Connect to a Teleport node in a remote cluster using Teleport.
[x] Connect to a Teleport node in a remote cluster using the Web UI.

Teleport with EKS/GKE @jimbishopp

[x] Deploy Teleport on a single EKS cluster
[x] Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
[x] Deploy Teleport Proxy outside of GKE cluster fronting connections to it (use this script to generate a kubeconfig)
[x] Deploy Teleport Proxy outside of EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @jimbishopp

Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

[x] Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig
- [x] Login with tsh login, check that tsh kube ls has your cluster
- [x] Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
- [x] Verify that the audit log recorded the above request and session
[x] Deploy combo auth/proxy/kubernetes_service inside of a Kubernetes cluster
- [x] Login with tsh login, check that tsh kube ls has your cluster
- [x] Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
- [x] Verify that the audit log recorded the above request and session
[x] Deploy combo auth/proxy_service outside of the Kubernetes cluster and kubernetes_service inside of a Kubernetes cluster, connected over a reverse tunnel
- [x] Login with tsh login, check that tsh kube ls has your cluster
- [x] Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
- [x] Verify that the audit log recorded the above request and session
[x] Deploy a second kubernetes_service inside of another Kubernetes cluster, connected over a reverse tunnel
- [x] Login with tsh login, check that tsh kube ls has both clusters
- [x] Switch to a second cluster using tsh kube login
- [x] Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh on the new cluster
- [x] Verify that the audit log recorded the above request and session
[x] Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig with multiple clusters in it
- [x] Login with tsh login, check that tsh kube ls has all clusters
[x] Test Kubernetes screen in the web UI (tab is located on left side nav on dashboard):
- [x] Verify that all kubes registered are shown with correct name and labels
- [x] Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
- [x] Verify searching for name or labels in the search bar works
- [x] Verify you can sort by name colum

Teleport with FIPS mode @r0mant

[x] Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @lxea

[x] Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @r0mant @zmb3

[x] Migrate trusted clusters from 2.4.0 to 2.5.0
- [x] Migrate auth server on main cluster, then rest of the servers on main cluster SSH should work for both main and old clusters
- [x] Migrate auth server on remote cluster, then rest of the remote cluster SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@proxy.example.com -s proxy:%h:%p@foo.com" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers @benarent @ptgott @xinding33

[ ] G Suite install instructions work
- [ ] G Suite Screenshots are up to date
[ ] ActiveDirectory install instructions work
- [ ] Active Directory Screenshots are up to date
[ ] Okta install instructions work
- [ ] Okta Screenshots are up to date
[ ] OneLogin install instructions work
- [ ] OneLogin Screenshots are up to date
[ ] OIDC install instructions work
- [ ] OIDC Screenshots are up to date

Teleport Plugins @Joerger

[x] Test receiving a message via Teleport Slackbot
[x] Test receiving a new Jira Ticket via Teleport Jira

WEB UI @kimlisa @ravicious @hatched @gzdunek

Main

For main, test with a role that has access to all resources.

Top Nav

[x] Verify that cluster selector displays all (root + leaf) clusters
[x] Verify that user name is displayed
[x] Verify that user menu shows logout, help&support, and account settings (for local users)

Side Nav

[x] Verify that each item has an icon
[x] Verify that Collapse/Expand works and collapsed has icon >, and expand has icon v
[x] Verify that it automatically expands and highlights the item on page refresh

Servers aka Nodes

[x] Verify that "Servers" table shows all joined nodes
[x] Verify that "Hostname", "Address" and "Labels" columns show the current values
[x] Verify that "Search" by hostname, address, labels works
[x] Verify that terminal opens when clicking on one of the available logins
[x] Verify that clicking on Add Server button renders dialogue set to Automatically view
- [x] Verify clicking on Regenerate Script regenerates token value in the bash command
- [x] Verify using the bash command successfully adds the server (refresh server list)
- [x] Verify that clicking on Manually tab renders manual steps
- [x] Verify that clicking back to Automatically tab renders bash command

Applications

[x] Verify that clicking on Add Application button renders dialogue
- [x] Verify input validation (prevent empty value and invalid url)
- [x] Verify after input and clicking on Generate Script, bash command is rendered
- [x] Verify clicking on Regenerate button regenerates token value in bash command

Databases

[x] Verify that clicking on Add Database button renders dialogue for manual instructions:
- [x] Verify selecting different options on Step 4 changes Step 5 commands
  Active Sessions
[x] Verify that "empty" state is handled
[x] Verify that it displays the session when session is active
[x] Verify that "Description", "Session ID", "Users", "Nodes" and "Duration" columns show correct values
[x] Verify that "OPTIONS" button allows to join a session

Audit log

[x] Verify that time range button is shown and works
[x] Verify that clicking on Session Ended event icon, takes user to session player
[x] Verify event detail dialogue renders when clicking on events details button
[x] Verify searching by type, description, created works

Users

[x] Verify that users are shown
[x] Verify that creating a new user works
[x] Verify that editing user roles works
[x] Verify that removing a user works
[x] Verify resetting a user's password works
[x] Verify search by username, roles, and type works

Auth Connectors

[x] Verify when there are no connectors, empty state renders
[x] Verify that creating OIDC/SAML/GITHUB connectors works
[x] Verify that editing OIDC/SAML/GITHUB connectors works
[x] Verify that error is shown when saving an invalid YAML
[x] Verify that correct hint text is shown on the right side
[x] Verify that encrypted SAML assertions work with an identity provider that supports it (Azure).
[x] Verify that created github, saml, oidc card has their icons
Roles
[x] Verify that roles are shown
[x] Verify that "Create New Role" dialog works
[x] Verify that deleting and editing works
[x] Verify that error is shown when saving an invalid YAML
[x] Verify that correct hint text is shown on the right side

Managed Clusters

[x] Verify that it displays a list of clusters (root + leaf)
[x] Verify that every menu item works: nodes, apps, audit events, session recordings, etc.

Help & Support

[ ] Verify that all URLs work and correct (no 404)

Access Requests @Joerger

Creating Access Requests

Create a role with limited permissions (defined below as allow-roles). This role allows you to see the Role screen and ssh into all nodes.
Create another role with limited permissions (defined below as allow-users). This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.
Create another role with no permissions other than being able to create requests (defined below as default)
Create a user with role default assigned

Create a few requests under this user to test pending/approved/denied state.

kind: role
metadata:
name: allow-roles
spec:
allow:
logins:
- root
node_labels:
  '*': '*'
rules:
- resources:
  - role
  verbs:
  - list
  - read
options:
max_session_ttl: 8h0m0s
version: v3

kind: role
metadata:
name: allow-users-short-ttl
spec:
allow:
rules:
- resources:
  - user
  verbs:
  - list
  - read
deny:
node_labels:
  '*': '*'
options:
max_session_ttl: 4m0s
version: v3

kind: role
metadata:
name: default
spec:
allow:
request:
  roles:
  - allow-roles
  - allow-users
  suggested_reviewers:
  - random-user-1
  - random-user-2
options:
max_session_ttl: 8h0m0s
version: v3

[x] #10642
[x] Verify input validation requires at least one role to be selected
[x] Verify you can select/input/modify reviewers
[x] Verify after creating a request, requests are listed in pending states
[x] Verify you can't review own requests

Viewing & Approving/Denying Requests

Create a user with the role reviewer that allows you to review all requests, and delete them.

kind: role
version: v3
metadata:
  name: reviewer
spec:
  allow:
    review_requests:
      roles: ['*']

[x] Verify you can view access request from request list
[x] Verify there is list of reviewers you selected (empty list if none selected AND suggested_reviewers wasn't defined)
[x] Verify threshold name is there (it will be default if thresholds weren't defined in role, or blank if not named)
[x] Verify you can approve a request with message, and immediately see updated state with your review stamp (green checkmark) and message box
[x] Verify you can deny a request, and immediately see updated state with your review stamp (red cross)
[x] Verify deleting the denied request is removed from list

Assuming Approved Requests

[x] Verify assume buttons are only present for approved request and for logged in user
[x] Verify that assuming allow-roles allows you to see roles screen and ssh into nodes
[x] Verify that after clicking on the assume button, it is disabled in both the list and in viewing
[x] After assuming allow-roles, verify that assuming allow-users-short-ttl allows you to see users screen, and denies access to nodes
- [x] Verify a switchback banner is rendered with roles assumed, and count down of when it expires
- [x] Verify switching back goes back to your default static role
- [x] Verify after re-assuming allow-users-short-ttl role, the user is automatically logged out after the expiry is met (4 minutes)
[x] Verify that after logging out (or getting logged out automatically) and relogging in, permissions are reset to default, and requests that are not expired and are approved are assumable again

Access Request Waiting Room @kimlisa

Strategy Reason

Create the following role:

kind: role
metadata:
  name: waiting-room
spec:
  allow:
    request:
      roles:
      - <some other role to assign user after approval>
  options:
    max_session_ttl: 8h0m0s
    request_access: reason
    request_prompt: <some custom prompt to show in reason dialogue>
version: v3

[x] Verify after login, reason dialogue is rendered with prompt set to request_prompt setting
[x] Verify after clicking send request, pending dialogue renders
[x] Verify after approving a request, dashboard is rendered
[x] Verify the correct role was assigned

Strategy Always

With the previous role you created from Strategy Reason, change request_access to always:

[x] Verify after login, pending dialogue is auto rendered
[x] Verify after approving a request, dashboard is rendered
[x] Verify after denying a request, access denied dialogue is rendered
[x] Verify a switchback banner is rendered with roles assumed, and count down of when it expires
[x] Verify switchback button says Logout and clicking goes back to the login screen

Strategy Optional

With the previous role you created from Strategy Reason, change request_access to optional:

[x] Verify after login, dashboard is rendered as normal

Terminal

[x] Verify that top nav has a user menu (Main and Logout)
[x] Verify that switching between tabs works on alt+[1...9]

Node List Tab

[x] Verify that Cluster selector works (URL should change too)
[x] Verify that Quick launcher input works
[x] Verify that Quick launcher input handles input errors
[x] Verify that "Connect" button shows a list of available logins
[x] Verify that "Hostname", "Address" and "Labels" columns show the current values
[ ] Verify that "Search" by hostname, address, labels work
[x] Verify that new tab is created when starting a session

Session Tab

[x] Verify that session and browser tabs both show the title with login and node name
[x] Verify that terminal resize works
- Install midnight commander on the node you ssh into: $ sudo apt-get install mc
- Run the program: $ mc
- Resize the terminal to see if panels resize with it
[x] Verify that session tab shows/updates number of participants when a new user joins the session
[x] Verify that tab automatically closes on "$ exit" command
[x] Verify that SCP Upload works
[x] Verify that SCP Upload handles invalid paths and network errors
[x] Verify that SCP Download works
[x] Verify that SCP Download handles invalid paths and network errors

Session Player

[x] Verify that it can replay a session
[x] Verify that when playing, scroller auto scrolls to bottom most content
[x] Verify when resizing player to a small screen, scroller appears and is working
[x] Verify that error message is displayed (enter an invalid SID in the URL)

Invite and Reset Form

[x] Verify that input validates
[x] Verify that invite works with 2FA disabled
[x] Verify that invite works with OTP enabled
[x] Verify that invite works with U2F enabled (except safari and google)
[x] Verify that invite works with WebAuthn enabled
[x] Verify that error message is shown if an invite is expired/invalid

Login Form and Change Password

[x] Verify that input validates
[x] Verify that login works with 2FA disabled
[x] Verify that changing passwords works for 2FA disabled
[x] Verify that login works with OTP enabled
[x] Verify that changing passwords works for OTP enabled
[x] Verify that login works with U2F enabled
[x] Verify that changing passwords works for U2F enabled
[x] Verify that login works with WebAuthn enabled
[x] Verify that changing passwords works for WebAuthn enabled
[x] Verify that login works for Github/SAML/OIDC
[x] Verify that redirect to original URL works after successful login
[x] Verify that account is locked after several unsuccessful login attempts
[x] Verify that account is locked after several unsuccessful change password attempts

Multi-factor Authentication (mfa)

Create/modify teleport.yaml and set the following authentication settings under auth_service

authentication:
  type: local
  second_factor: optional
  require_session_mfa: yes
  webauthn:
    rp_id: example.com

MFA invite, login, password reset, change password

[x] Verify during invite/reset, second factor list all auth types: none, hardware key, and authenticator app
[x] Verify registration works with all option types
[x] Verify login with all option types
[x] Verify changing password with all option types
[x] Change second_factor type to on and verify that mfa is required (no option none in dropdown)

MFA require auth

Go to Account Settings > Two-Factor Devices and register a new device

Using the same user as above:

[x] Verify logging in with registered WebAuthn key works
[x] Verify connecting to a ssh node prompts you to tap your registered WebAuthn key
[ ] Verify in the web terminal, you can scp upload/download files

MFA Management

[x] Verify adding first device works without requiring re-authentication
[x] Verify re-authenticating with a WebAuthn device works
[x] Verify re-authenticating with a U2F device works
[x] Verify re-authenticating with a OTP device works
[x] Verify adding a WebAuthn device works
[x] Verify adding a U2F device works
[x] Verify adding an OTP device works
[x] Verify removing a device works
[x] Verify second_factor set to off disables adding devices

Cloud

From your cloud staging account, change the field teleportVersion to the test version.

$ kubectl -n <namespace> edit tenant

Recovery Code Management

[x] Verify generating recovery codes for local accounts with email usernames works
[x] Verify local accounts with non-email usernames are not able to generate recovery codes
[x] Verify SSO accounts are not able to generate recovery codes

Invite/Reset

[x] Verify email as usernames, renders recovery codes dialog
[x] Verify non email usernames, does not render recovery codes dialog

Recovery Flow: Add new mfa device

[ ] Verify recovering (adding) a new hardware key device with password
[x] Verify recovering (adding) a new otp device with password
[x] Verify viewing and deleting any old device (but not the one just added)
[x] Verify new recovery codes are rendered at the end of flow

Recovery Flow: Change password

[x] Verify recovering password with any mfa device
[x] Verify new recovery codes are rendered at the end of flow

Recovery Email

[x] Verify receiving email for link to start recovery
[x] Verify receiving email for successfully recovering
[x] Verify email link is invalid after successful recovery
[x] Verify receiving email for locked account when max attempts reached

RBAC

Create a role, with no allow.rules defined:

kind: role
metadata:
  name: rbac
spec:
  allow:
    app_labels:
      '*': '*'
    logins:
    - root
    node_labels:
      '*': '*'
  options:
    max_session_ttl: 8h0m0s
version: v3

[x] Verify that a user has access only to: "Servers", "Applications", "Databases", "Kubernetes", "Active Sessions", "Access Requests" and "Manage Clusters"
[x] Verify there is no Add Server, Application, Databases, Kubernetes button in each respective view
[x] Verify only Servers, Apps, Databases, and Kubernetes are listed under options button in Manage Clusters

Note: User has read/create access_request access to their own requests, despite resource settings

Add the following under spec.allow.rules to enable read access to the audit log:

  - resources:
      - event
      verbs:
      - list

[x] Verify that the Audit Log and Session Recordings is accessible
[x] Verify that playing a recorded session is denied

Add the following to enable read access to recorded sessions

  - resources:
      - session
      verbs:
      - read

[x] Verify that a user can re-play a session (session.end)

Add the following to enable read access to the roles

- resources:
      - role
      verbs:
      - list
      - read

[x] Verify that a user can see the roles
[x] Verify that a user cannot create/delete/update a role

Add the following to enable read access to the auth connectors

- resources:
      - auth_connector
      verbs:
      - list
      - read

[x] Verify that a user can see the list of auth connectors.
[x] Verify that a user cannot create/delete/update the connectors

Add the following to enable read access to users

  - resources:
      - user
      verbs:
      - list
      - read

[x] Verify that a user can access the "Users" screen
[x] Verify that a user cannot reset password and create/delete/update a user

Add the following to enable read access to trusted clusters

  - resources:
      - trusted_cluster
      verbs:
      - list
      - read

[x] Verify that a user can access the "Trust" screen
[x] Verify that a user cannot create/delete/update a trusted cluster.

Performance/Soak Test @fspmarshall @espadolini @rosstimothy

Using tsh bench tool, perform the soak tests and benchmark tests on the following configurations:

Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB

Soak Tests

Run 4hour soak test with a mix of interactive/non-interactive sessions:

tsh bench --duration=4h user@teleport-monster-6757d7b487-x226b ls
tsh bench -i --duration=4h user@teleport-monster-6757d7b487-x226b ps uax

Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks

[ ] Verify that prometheus metrics are accurate.

Breaking load tests

Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive and non interactive tsh bench loads.

Teleport with Cloud Providers

AWS @xacrimon

[x] Deploy Teleport to AWS. Using DynamoDB & S3
[x] Deploy Teleport Enterprise to AWS. Using HA Setup https://gravitational.com/teleport/docs/aws-terraform-guide/

GCP @xacrimon

[x] Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
[x] Deploy Teleport to GKE. Google Kubernetes engine.
[x] Deploy Teleport Enterprise to GCP.

IBM @r0mant

[x] Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store
[x] Deploy Teleport to IBM Cloud Kubernetes.
[x] Deploy Teleport Enterprise to IBM Cloud.

Application Access @gabrielcorado @Tener

[x] Run an application within local cluster. @gabrielcorado
- [x] Verify the debug application debug_app: true works.
- [x] Verify an application can be configured with command line flags.
- [x] Verify an application can be configured from file configuration.
- [x] Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr and well as publicAddr.
[x] Run an application within a trusted cluster. @gabrielcorado
- [x] Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr.
[x] Verify Audit Records. @gabrielcorado
- [x] app.session.start and app.session.chunk events are created in the Audit Log.
- [x] app.session.chunk points to a 5 minute session archive with multiple app.session.request events inside.
- [x] tsh play <chunk-id> can fetch and print a session chunk archive.
[x] Verify JWT using verify-jwt.go. @Tener
[x] Verify RBAC. @Tener
[x] Verify CLI access with tsh app login. @Tener
[x] Verify AWS console access. @Tener
- [x] Can log into AWS web console through the web UI.
- [x] Can interact with AWS using tsh aws commands.
[x] Verify dynamic registration. @Tener
- [x] Can register a new app using tctl create.
- [x] Can update registered app using tctl create -f.
- [x] Can delete registered app using tctl rm.
[x] Test Applications screen in the web UI (tab is located on left side nav on dashboard): @gabrielcorado
- [x] Verify that all apps registered are shown
- [x] Verify that clicking on the app icon takes you to another tab
- [x] Verify using the bash command produced from Add Application dialogue works (refresh app screen to see it registered)

Database Access

[x] Connect to a database within a local cluster.
- [x] Self-hosted Postgres. @gabrielcorado
- [x] Self-hosted MySQL. @jakule
- [x] Self-hosted MariaDB. @jakule
- [x] Self-hosted MongoDB. @Tener
- [x] Self-hosted CockroachDB. @r0mant
- [x] Self-hosted Redis. @jakule
- [x] AWS Aurora Postgres. @greedy52
- [x] AWS Aurora MySQL. @greedy52
- [x] AWS Redshift. @greedy52
- [x] GCP Cloud SQL Postgres. @r0mant
- [x] GCP Cloud SQL MySQL. @r0mant
- [x] MSSQL with AD auth. @smallinsky
[x] Connect to a database within a remote cluster via a trusted cluster.
- [x] Self-hosted Postgres. @gabrielcorado
- [x] Self-hosted MySQL. @jakule
- [x] Self-hosted MariaDB. @jakule
- [x] Self-hosted MongoDB. @Tener
- [x] Self-hosted CockroachDB. @r0mant
- [x] Self-hosted Redis. @jakule
- [x] AWS Aurora Postgres. @greedy52
- [x] AWS Aurora MySQL. @greedy52
- [x] AWS Redshift. @greedy52
- [x] GCP Cloud SQL Postgres. @r0mant
- [x] GCP Cloud SQL MySQL. @r0mant
- [x] MSSQL with AD auth. @smallinsky
[x] Verify audit events. @jakule
- [x] db.session.start is emitted when you connect.
- [x] db.session.end is emitted when you disconnect.
- [x] db.session.query is emitted when you execute a SQL query.
[x] Verify RBAC. @smallinsky
- [x] tsh db ls shows only databases matching role's db_labels.
- [x] Can only connect as users from db_users.
- [x] (Postgres only) Can only connect to databases from db_names.
- [x] db.session.start is emitted when connection attempt is denied.
- [x] (MongoDB only) Can only execute commands in databases from db_names.
- [x] db.session.query is emitted when command fails due to permissions.
- [x] Can configure per-session MFA.
- [x] MFA tap is required on each tsh db connect.
[x] Verify dynamic registration. @Tener
- [x] Can register a new database using tctl create.
- [x] Can update registered database using tctl create -f.
- [x] Can delete registered database using tctl rm.
[x] Verify discovery. @jakule
- [x] Can detect and register RDS instances.
- [x] Can detect and register Aurora clusters, and their reader and custom endpoints.
- [x] Can detect and register Redshift clusters.
[x] Test Databases screen in the web UI (tab is located on left side nav on dashboard): @gabrielcorado
- [x] Verify that all dbs registered are shown with correct name, description, type, and labels
- [x] Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
- [x] Verify searching for all columns in the search bar works
- [x] Verify you can sort by all columns except labels

TLS Routing @smallinsky

[x] Verify that teleport proxy v2 configuration starts only a single listener.

version: v2
teleport:
proxy_service:
  enabled: "yes"
  public_addr: ['root.example.com']
  web_listen_addr: 0.0.0.0:3080

[x] Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex"
- [x] Trusted cluster
- [x] Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
```
kind: trusted_cluster
spec:
...
web_proxy_addr: root.example.com:443
tunnel_addr: root.example.com:443
...
```
[x] Database Access
- [x] Verify that tsh db connect works through proxy running in multiplex mode
- [x] Postgres
- [x] MySQL
- [x] MariaDB
- [x] MongoDB
- [x] CockroachDB
- [x] Verify connecting to a database through TLS ALPN SNI local proxy tsh db proxy with a GUI client.
[x] Application Access
- [x] Verify app access through proxy running in multiplex mode
[x] SSH Access
- [x] Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" user@host.example.com
- [x] Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" user@node.foo.com
- [x] Verify tsh ssh access through proxy running in multiplex mode
[x] Kubernetes access:
- [x] Verify kubernetes access through proxy running in multiplex mode

Desktop Access @zmb3 @lxea @ibeckermayer

Direct mode (set listen_addr): @lxea
- [x] Can connect to desktop defined in static hosts section.
- [x] Can connect to desktop discovered via LDAP
IoT mode (reverse tunnel through proxy): @lxea
- [x] Can connect to desktop defined in static hosts section.
- [x] Can connect to desktop discovered via LDAP
[ ] Connect multiple windows_desktop_services to the same Teleport cluster, verify that connections to desktops on different AD domains works. (Attempt to connect several times to verify that you are routed to the correct windows_desktop_service) (@zmb3)
Verify user input
- [x] Download Keyboard Key Info and verify all keys are processed correctly in each supported browser. Known issues: F11 cannot be captured by the browser without special configuration on MacOS.
- [x] Left click and right click register as Windows clicks. (Right click on the desktop should show a Windows menu, not a browser context menu)
- [x] Vertical and horizontal scroll work. Horizontal Scroll Test
Locking
- [x] Verify that placing a user lock terminates an active desktop session.
- [x] Verify that placing a desktop lock terminates an active desktop session.
- [x] Verify that placing a role lock terminates an active desktop session.
- [ ] Verify that placing an MFA device lock terminates an active desktop session.
Labeling
- [x] Set client_idle_timeout to a small value and verify that idle sessions are terminated (the session should end and an audit event will confirm it was due to idle connection) @lxea
- [x] All desktops have teleport.dev/origin label.
- [x] Dynamic desktops have additional teleport.dev labels for OS, OS Version, DNS hostname.
- [x] Regexp-based host labeling applies across all desktops, regardless of origin. @lxea
RBAC
- [x] RBAC denies access to a Windows desktop due to labels @lxea
- [x] RBAC denies access to a Windows desktop with the wrong OS-login. @lxea
Clipboard Support
- When a user has a role with clipboard sharing enabled and is using a chromium based browser
- [x] Going to a desktop when clipboard permissions are in "Ask" mode (aka "prompt") causes the browser to show a prompt while the UI shows a spinner
- [x] X-ing out of the prompt (causing the clipboard permission to remain in "Ask" mode) causes the prompt to show up again
- [x] Denying clibpoard permissions brings up a relevant error alert (with "Clipboard Sharing Disabled" in the top bar)
- [x] Allowing clipboard permissions allows you to see the desktop session, with "Clipboard Sharing Enabled" highlighted in the top bar
- [x] Copy text from local workstation, paste into remote desktop
- [x] Copy text from remote desktop, paste into local workstation
- When a user has a role with clipboard sharing enabled and is not using a chromium based browser
- [x] The UI shows a relevant alert and "Clipboard Sharing Disabled" is highlighted in the top bar
- When a user has a role with clipboard sharing disabled and is using a chromium and non-chromium based browser (confirm both)
- [x] The live session should show disabled in the top bar and copy/paste should not work between your workstation and the remote desktop.
Per-Session MFA (try webauthn on each of Chrome, Safari, and Firefox; u2f only works with Firefox) (N/A to beta.1)
- [x] Attempting to start a session no keys registered shows an error message
- [x] Attempting to start a session with a u2f key registered shows an error message
- [x] Attempting to start a session with a webauthn registered pops up the "Verify Your Identity" dialog
- [x] Hitting "Cancel" shows an error message
- [x] Hitting "Verify" causes your browser to prompt you for MFA
- [x] Cancelling that browser MFA prompt shows an error
- [x] Successful MFA verification allows you to connect
Session Recording (@zmb3)
- [x] Verify sessions are not recorded if all of a user's roles disable recording
- [x] Verify sync recording (mode: node-sync or mode: proy-sync)
- [x] Verify async recording (mode: node or mode: proxy)
- [x] Sessions show up in session recordings UI with desktop icon
- [x] Sessions can be played back, including play/pause functionality
- [x] RBAC for sessions: ensure users can only see their own recordings when using the RBAC rule from our docs
Audit Events (check these after performing the above tests)
- [x] windows.desktop.session.start (TDP00I) emitted on start
- [x] windows.desktop.session.start (TDP00W) emitted when session fails to start (due to RBAC, for example)
- [x] windows.desktop.session.end (TDP01I) emitted on end
- [x] desktop.clipboard.send (TDP02I) emitted for local copy -> remote paste
- [x] desktop.clipboard.receive (TDP03I) emitted for remote copy -> local paste

webvictim commented 2 years ago

@r0mant You have me assigned to IBM, should probably pick someone else :D

r0mant commented 2 years ago

@webvictim I can pick someone else but I'm told you're the only one who knows how to do it 😄 Will you be able to help out the person who'll be doing it? Would be good opportunity to transfer the knowledge too.

Tener commented 2 years ago

@r0mant are we sharing particular binaries for testing or is everyone meant to build their own once a release branch is made?

russjones commented 2 years ago

@Tener Yes, once we cut off branch/v9 @r0mant or @zmb3 will share a tag with the team to start testing with.

russjones commented 2 years ago

@r0mant Before we can cut off branch/v9 we have to update e/Makefile to add tbot target: https://github.com/gravitational/teleport.e/pull/401.

xacrimon commented 2 years ago

Ran the utmp tests and while they didn't suffer regressions from v8 I confirmed the issue fixed by #10460 on RHEL 8.4 as reported in the corresponding issue. I know we didn't cut branch/v9 yet but had to do a full manual test run of the feature anyway for the PR.

Tener commented 2 years ago

During my tests, I ran into an issue with tctl users add: https://github.com/gravitational/teleport/issues/10574.

For sure, the docs need to be updated, but we have also removed some functionality which I'm not sure is a correct/good thing. Details in the ticket.

Tener commented 2 years ago

Another issue: https://github.com/gravitational/teleport/issues/10576

This feels like a corner case that has always been there, but we should probably address it sooner or later.

Tener commented 2 years ago

I marked "Sessions can be recorded at the proxy" as complete, because it kind of works, but at the same time it causes some issues. Here are the details: https://github.com/gravitational/teleport/issues/10586

webvictim commented 2 years ago

@webvictim I can pick someone else but I'm told you're the only one who knows how to do it 😄 Will you be able to help out the person who'll be doing it? Would be good opportunity to transfer the knowledge too.

All I ever did was use the IBM Cloud account we have in 1Password along with the instructions in the docs from https://goteleport.com/docs/setup/deployments/ibm/

I think the best solution would be to assign this to someone else and if they run into issues, I can definitely help them out and provide some guidance based on my experience. If we can't deploy this following our own written guide then there's definitely a bigger issue to solve :)

rosstimothy commented 2 years ago

etcd load testing

Soak Tests

----Non-IoT Node Test ----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-5f88cb8c68-mm4xd ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         115 ms
50         119 ms
75         123 ms
90         129 ms
95         137 ms
99         185 ms
100        700 ms

tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-5f88cb8c68-mm4xd ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         119 ms
50         123 ms
75         128 ms
90         135 ms
95         142 ms
99         184 ms
100        393 ms

----IoT Node Test ----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-88c55fcb5-9fr7b ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         116 ms
50         120 ms
75         125 ms
90         132 ms
95         139 ms
99         177 ms
100        413 ms

tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-88c55fcb5-9fr7b ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         119 ms
50         125 ms
75         132 ms
90         141 ms
95         149 ms
99         182 ms
100        799 ms

10,000 IoT Node Scaling Test

9 0_10k_IoT_etcd

10,000 Non-IoT Node Scaling Test

9 0_10k_non_IoT_etcd

500 Trusted Clusters Scaling Test

9 0_500_TC

The 500 Trusted Clusters tests revealed a goroutine and memory leak https://github.com/gravitational/teleport/issues/10648

russjones commented 2 years ago

Aggregate last 4 releases.

Backend	Cluster Size	Mode	PTY	6.2	7.0	8.0	9.0
etcd	10k	Regular	No	~49183 ms~	~56383 ms~ 4475 ms	3335 ms	700 ms
etcd	10k	Regular	Yes	~59423 ms~	~61215 ms~ 4507 ms	4647 ms	393 ms
etcd	10k	Tunnel	No	~65439 ms~	~53759 ms~ 4451 ms	4259 ms	143 ms
etcd	10k	Tunnel	Yes	~64924 ms~	~48223 ms~ 4435 ms	3143 ms	799 ms
DynamoDB	10k	Regular	No				5147 ms
DynamoDB	10k	Regular	Yes				222 ms
DynamoDB	10k	Tunnel	No				235 ms
DynamoDB	10k	Tunnel	Yes				198 ms
DynamoDB	1	Regular	No		2471 ms	1824 ms
DynamoDB	1	Regular	Yes		2081 ms	1483 ms
DynamoDB	1	Tunnel	No		826 ms	2125 ms
DynamoDB	1	Tunnel	Yes		518 ms	2002 ms

russjones commented 2 years ago

@rosstimothy One odd thing is that the difference between interactive and non-interactive is so much and it inverts between regular and tunnel connections.

kimlisa commented 2 years ago

bug with require_session_mfa flag: https://github.com/gravitational/teleport/issues/10659
audit log issues: https://github.com/gravitational/teleport/issues/7946#issuecomment-1053898019

rosstimothy commented 2 years ago

dynamo load testing

Soak Tests

----Non-IoT Node Test----
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m root@node-7f454d4dbd-cxnrk ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         127 ms
50         131 ms
75         135 ms
90         139 ms
95         142 ms
99         160 ms
100        5147 ms

tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-7f454d4dbd-cxnrk ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         128 ms
50         132 ms
75         136 ms
90         140 ms
95         143 ms
99         162 ms
100        222 ms

----IoT Node Test----
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-86666788fc-k45wn ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         130 ms
50         137 ms
75         143 ms
90         147 ms
95         149 ms
99         158 ms
100        235 ms

tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-86666788fc-k45wn ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         131 ms
50         138 ms
75         144 ms
90         148 ms
95         150 ms
99         156 ms
100        198 ms

10,000 IoT Node Scaling Test

9 0_10k_IoT_dynamo

10,000 Non-IoT Node Scaling Test

9 0_10k_non_IoT_dynamo

500 Trusted Clusters Scaling Test

9 0_500_TC_dynamo

russjones commented 2 years ago

@rosstimothy DynamoDB, 10k, non-IoT, no-PTY 100% looks off.

Percentile Response Duration
---------- -----------------
25         127 ms
50         131 ms
75         135 ms
90         139 ms
95         142 ms
99         160 ms
100        5147 ms

Tener commented 2 years ago

This feels like worth fixing prior to release: https://github.com/gravitational/teleport/issues/10794

gravitational / teleport

Teleport 9.0 Test Plan #10446

Manual Testing Plan

User accounting @xacrimon

Combinations @atburke

Teleport with EKS/GKE @jimbishopp

Teleport with multiple Kubernetes clusters @jimbishopp

Teleport with FIPS mode @r0mant

ACME @lxea

Migrations @r0mant @zmb3

Command Templates

OpenSSH

Teleport

Teleport with SSO Providers @benarent @ptgott @xinding33

Teleport Plugins @Joerger

WEB UI @kimlisa @ravicious @hatched @gzdunek

Main

Top Nav

Side Nav

Servers aka Nodes

Applications

Databases

Active Sessions

Audit log

Users

Auth Connectors

Roles

Managed Clusters

Help & Support

Access Requests @Joerger

Creating Access Requests

Viewing & Approving/Denying Requests

Assuming Approved Requests

Access Request Waiting Room @kimlisa

Strategy Reason

Strategy Always

Strategy Optional

Terminal

Node List Tab

Session Tab

Session Player

Invite and Reset Form

Login Form and Change Password

Multi-factor Authentication (mfa)

MFA invite, login, password reset, change password

MFA require auth

MFA Management

Cloud

Recovery Code Management

Invite/Reset

Recovery Flow: Add new mfa device

Recovery Flow: Change password

Recovery Email

RBAC

Performance/Soak Test @fspmarshall @espadolini @rosstimothy

Teleport with Cloud Providers

AWS @xacrimon

GCP @xacrimon

IBM @r0mant

Application Access @gabrielcorado @Tener

Database Access

TLS Routing @smallinsky

Desktop Access @zmb3 @lxea @ibeckermayer

etcd load testing

Soak Tests

10,000 IoT Node Scaling Test

10,000 Non-IoT Node Scaling Test

500 Trusted Clusters Scaling Test

dynamo load testing

Soak Tests

10,000 IoT Node Scaling Test

10,000 Non-IoT Node Scaling Test

500 Trusted Clusters Scaling Test