Closed r0mant closed 2 years ago
@r0mant You have me assigned to IBM, should probably pick someone else :D
@webvictim I can pick someone else but I'm told you're the only one who knows how to do it 😄 Will you be able to help out the person who'll be doing it? Would be good opportunity to transfer the knowledge too.
@r0mant are we sharing particular binaries for testing or is everyone meant to build their own once a release branch is made?
@Tener Yes, once we cut off branch/v9
@r0mant or @zmb3 will share a tag with the team to start testing with.
@r0mant Before we can cut off branch/v9
we have to update e/Makefile
to add tbot
target: https://github.com/gravitational/teleport.e/pull/401.
Ran the utmp
tests and while they didn't suffer regressions from v8 I confirmed the issue fixed by #10460 on RHEL 8.4 as reported in the corresponding issue. I know we didn't cut branch/v9
yet but had to do a full manual test run of the feature anyway for the PR.
During my tests, I ran into an issue with tctl users add
: https://github.com/gravitational/teleport/issues/10574.
For sure, the docs need to be updated, but we have also removed some functionality which I'm not sure is a correct/good thing. Details in the ticket.
Another issue: https://github.com/gravitational/teleport/issues/10576
This feels like a corner case that has always been there, but we should probably address it sooner or later.
I marked "Sessions can be recorded at the proxy" as complete, because it kind of works, but at the same time it causes some issues. Here are the details: https://github.com/gravitational/teleport/issues/10586
@webvictim I can pick someone else but I'm told you're the only one who knows how to do it 😄 Will you be able to help out the person who'll be doing it? Would be good opportunity to transfer the knowledge too.
All I ever did was use the IBM Cloud account we have in 1Password along with the instructions in the docs from https://goteleport.com/docs/setup/deployments/ibm/
I think the best solution would be to assign this to someone else and if they run into issues, I can definitely help them out and provide some guidance based on my experience. If we can't deploy this following our own written guide then there's definitely a bigger issue to solve :)
----Non-IoT Node Test ----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-5f88cb8c68-mm4xd ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 115 ms
50 119 ms
75 123 ms
90 129 ms
95 137 ms
99 185 ms
100 700 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-5f88cb8c68-mm4xd ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 119 ms
50 123 ms
75 128 ms
90 135 ms
95 142 ms
99 184 ms
100 393 ms
----IoT Node Test ----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-88c55fcb5-9fr7b ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 116 ms
50 120 ms
75 125 ms
90 132 ms
95 139 ms
99 177 ms
100 413 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-88c55fcb5-9fr7b ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 119 ms
50 125 ms
75 132 ms
90 141 ms
95 149 ms
99 182 ms
100 799 ms
Aggregate last 4 releases.
Backend | Cluster Size | Mode | PTY | 6.2 | 7.0 | 8.0 | 9.0 |
---|---|---|---|---|---|---|---|
etcd | 10k | Regular | No | ~49183 ms~ | ~56383 ms~ 4475 ms | 3335 ms | 700 ms |
etcd | 10k | Regular | Yes | ~59423 ms~ | ~61215 ms~ 4507 ms | 4647 ms | 393 ms |
etcd | 10k | Tunnel | No | ~65439 ms~ | ~53759 ms~ 4451 ms | 4259 ms | 143 ms |
etcd | 10k | Tunnel | Yes | ~64924 ms~ | ~48223 ms~ 4435 ms | 3143 ms | 799 ms |
DynamoDB | 10k | Regular | No | 5147 ms | |||
DynamoDB | 10k | Regular | Yes | 222 ms | |||
DynamoDB | 10k | Tunnel | No | 235 ms | |||
DynamoDB | 10k | Tunnel | Yes | 198 ms | |||
DynamoDB | 1 | Regular | No | 2471 ms | 1824 ms | ||
DynamoDB | 1 | Regular | Yes | 2081 ms | 1483 ms | ||
DynamoDB | 1 | Tunnel | No | 826 ms | 2125 ms | ||
DynamoDB | 1 | Tunnel | Yes | 518 ms | 2002 ms |
@rosstimothy One odd thing is that the difference between interactive and non-interactive is so much and it inverts between regular and tunnel connections.
require_session_mfa
flag: https://github.com/gravitational/teleport/issues/10659----Non-IoT Node Test----
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m root@node-7f454d4dbd-cxnrk ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 127 ms
50 131 ms
75 135 ms
90 139 ms
95 142 ms
99 160 ms
100 5147 ms
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-7f454d4dbd-cxnrk ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 128 ms
50 132 ms
75 136 ms
90 140 ms
95 143 ms
99 162 ms
100 222 ms
----IoT Node Test----
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-86666788fc-k45wn ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 130 ms
50 137 ms
75 143 ms
90 147 ms
95 149 ms
99 158 ms
100 235 ms
tsh --insecure --proxy=proxy:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-86666788fc-k45wn ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 131 ms
50 138 ms
75 144 ms
90 148 ms
95 150 ms
99 156 ms
100 198 ms
@rosstimothy DynamoDB, 10k, non-IoT, no-PTY 100% looks off.
Percentile Response Duration
---------- -----------------
25 127 ms
50 131 ms
75 135 ms
90 139 ms
95 142 ms
99 160 ms
100 5147 ms
This feels like worth fixing prior to release: https://github.com/gravitational/teleport/issues/10794
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.
[x] Adding nodes to a cluster @codingllama
[x] Labels @nklaassen
[x] Trusted Clusters @espadolini
[ ] RBAC @timothyb89
Make sure that invalid and valid attempts are reflected in audit log.
[x] Verify that custom PAM environment variables are available as expected. @xacrimon
[x] Users @codingllama
With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedtsh mfa add
[x] Backends @smallinsky
[x] Session Recording @Tener
[x] Audit Log @greedy52
Node/Proxy ID may be found at
/var/lib/teleport/host_uuid
in the corresponding machine.Node IDs may also be queried via
tctl nodes ls
.scp
commands are recordedSubsystem testing may be achieved using both Recording Proxy mode and OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node running OpenSSH/sshd, you may use the following command to trigger a subsystem audit log:[x] Interact with a cluster using
tsh
@hatchedThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
[x] Interact with a cluster using
ssh
@fheinecke Make sure to test both recording and regular proxy modes.[x] Interact with a cluster using the Web UI @gzdunek
User accounting @xacrimon
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @atburke
For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @jimbishopp
Teleport with multiple Kubernetes clusters @jimbishopp
Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columTeleport with FIPS mode @r0mant
ACME @lxea
Migrations @r0mant @zmb3
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers @benarent @ptgott @xinding33
Teleport Plugins @Joerger
WEB UI @kimlisa @ravicious @hatched @gzdunek
Main
For main, test with a role that has access to all resources.
Top Nav
Side Nav
>
, and expand has iconv
Servers aka Nodes
Add Server
button renders dialogue set toAutomatically
viewRegenerate Script
regenerates token value in the bash commandManually
tab renders manual stepsAutomatically
tab renders bash commandApplications
Add Application
button renders dialogueGenerate Script
, bash command is renderedRegenerate
button regenerates token value in bash commandDatabases
Add Database
button renders dialogue for manual instructions:Step 4
changesStep 5
commandsActive Sessions
Audit log
Session Ended
event icon, takes user to session playerdetails
buttonUsers
Auth Connectors
Roles
Managed Clusters
Help & Support
Access Requests @Joerger
Creating Access Requests
allow-roles
). This role allows you to see the Role screen and ssh into all nodes.allow-users
). This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.default
)default
assignedViewing & Approving/Denying Requests
Create a user with the role
reviewer
that allows you to review all requests, and delete them.default
if thresholds weren't defined in role, or blank if not named)Assuming Approved Requests
allow-roles
allows you to see roles screen and ssh into nodesallow-roles
, verify that assumingallow-users-short-ttl
allows you to see users screen, and denies access to nodesswitching back
goes back to your default static roleallow-users-short-ttl
role, the user is automatically logged out after the expiry is met (4 minutes)default
, and requests that are not expired and are approved are assumable againAccess Request Waiting Room @kimlisa
Strategy Reason
Create the following role:
request_prompt
settingsend request
, pending dialogue rendersStrategy Always
With the previous role you created from
Strategy Reason
, changerequest_access
toalways
:Logout
and clicking goes back to the login screenStrategy Optional
With the previous role you created from
Strategy Reason
, changerequest_access
tooptional
:Terminal
Node List Tab
Session Tab
$ sudo apt-get install mc
$ mc
Session Player
Invite and Reset Form
Login Form and Change Password
Multi-factor Authentication (mfa)
Create/modify
teleport.yaml
and set the following authentication settings underauth_service
MFA invite, login, password reset, change password
second_factor
type toon
and verify that mfa is required (no optionnone
in dropdown)MFA require auth
Go to
Account Settings
>Two-Factor Devices
and register a new deviceUsing the same user as above:
MFA Management
second_factor
set tooff
disables adding devicesCloud
From your cloud staging account, change the field
teleportVersion
to the test version.Recovery Code Management
Invite/Reset
Recovery Flow: Add new mfa device
Recovery Flow: Change password
Recovery Email
RBAC
Create a role, with no
allow.rules
defined:Add Server, Application, Databases, Kubernetes
button in each respective viewServers
,Apps
,Databases
, andKubernetes
are listed underoptions
button inManage Clusters
Note: User has read/create access_request access to their own requests, despite resource settings
Add the following under
spec.allow.rules
to enable read access to the audit log:Audit Log
andSession Recordings
is accessibleAdd the following to enable read access to recorded sessions
Add the following to enable read access to the roles
Add the following to enable read access to the auth connectors
Add the following to enable read access to users
Add the following to enable read access to trusted clusters
Performance/Soak Test @fspmarshall @espadolini @rosstimothy
Using
tsh bench
tool, perform the soak tests and benchmark tests on the following configurations:Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB
Soak Tests
Run 4hour soak test with a mix of interactive/non-interactive sessions:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Breaking load tests
Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive and non interactive tsh bench loads.
Teleport with Cloud Providers
AWS @xacrimon
GCP @xacrimon
IBM @r0mant
Application Access @gabrielcorado @Tener
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
. @Tenertsh aws
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
dialogue works (refresh app screen to see it registered)Database Access
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.db_names
.db.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
TLS Routing @smallinsky
v2
configuration starts only a single listener.multiplex
modeauth_service.proxy_listener_mode: "multiplex"
web_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
modetsh db proxy
with a GUI client.multiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" user@host.example.com
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" user@node.foo.com
tsh ssh
access through proxy running in multiplex modemultiplex
modeDesktop Access @zmb3 @lxea @ibeckermayer
listen_addr
): @lxeahosts
section.hosts
section.windows_desktop_service
s to the same Teleport cluster, verify that connections to desktops on different AD domains works. (Attempt to connect several times to verify that you are routed to the correctwindows_desktop_service
) (@zmb3)client_idle_timeout
to a small value and verify that idle sessions are terminated (the session should end and an audit event will confirm it was due to idle connection) @lxeateleport.dev/origin
label.teleport.dev
labels for OS, OS Version, DNS hostname.beta.1
)mode: node-sync
ormode: proy-sync
)mode: node
ormode: proxy
)windows.desktop.session.start
(TDP00I
) emitted on startwindows.desktop.session.start
(TDP00W
) emitted when session fails to start (due to RBAC, for example)windows.desktop.session.end
(TDP01I
) emitted on enddesktop.clipboard.send
(TDP02I
) emitted for local copy -> remote pastedesktop.clipboard.receive
(TDP03I
) emitted for remote copy -> local paste