Closed russjones closed 2 years ago
Desktop Access: Seeing the arrow keys map incorrectly on macOS client. Same behavior in Chrome, Safari, and Firefox. This is using the built-in MacBook keyboard.
4
6
8
2
This is addressed with https://github.com/gravitational/teleport/pull/8791 which still needs to be backported (https://github.com/gravitational/teleport/pull/8813).
Potential K8s access issue when used via TLS routing. I get:
➜ ~ kubectl get nodes
Unable to connect to the server: x509: certificate is valid for mbp, root.gravitational.io, host.minikube.internal, localhost, remote.kube.proxy.teleport.cluster.local, host.minikube.internal, *.teleport.cluster.local, teleport.cluster.local, *.root.gravitational.io, *.host.minikube.internal, not kube.
I'm using kubernetes_service
with kubeconfig_file
(if that matters). cc @smallinsky
Desktop Access: disconnects due to client idle timeout fail to emit a disconnect event to the audit log (but otherwise disconnect correctly and emit a desktop session end event correctly). This is due to attempting to use the desktop ID as the event's "server ID."
Rejecting audit event client.disconnect("") from "3b581a11-b94a-4274-860b-8266868ca42e":
server "3b581a11-b94a-4274-860b-8266868ca42e" can't emit event with
server ID "foo-example-com".
The server ID must be the Windows Desktop Service's HostUUID, otherwise the ValidateServerMetadata
check will fail.
Fixed in #8828, which needs merge + backport.
tsh bench --duration=30m root@loadtest-7fbf6bcbfc-b5l7l ls
* Requests originated: 17988
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1289 ms
50 1301 ms
75 1319 ms
90 1374 ms
95 1600 ms
99 1648 ms
100 2053 ms
tsh bench --interactive --duration=30m root@loadtest-7fbf6bcbfc-b5l7l ps aux
* Requests originated: 17987
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1301 ms
50 1314 ms
75 1330 ms
90 1352 ms
95 1388 ms
99 1724 ms
100 3553 ms
tsh bench --duration=30m root@loadtest-7fbf6bcbfc-ww7h8 ls
* Requests originated: 17987
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1314 ms
50 1326 ms
75 1340 ms
90 1359 ms
95 1381 ms
99 1561 ms
100 3329 ms
tsh bench --interactive --duration=30m root@loadtest-7fbf6bcbfc-ww7h8 ps aux
* Requests originated: 17987
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1302 ms
50 1314 ms
75 1328 ms
90 1344 ms
95 1363 ms
99 1457 ms
100 3479 ms
tsh bench --duration=30m root@loadtest-7fbf6bcbfc-52zdr ls
* Requests originated: 17982
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1481 ms
50 1505 ms
75 1541 ms
90 1623 ms
95 1714 ms
99 1920 ms
100 3335 ms
tsh bench --interactive --duration=30m root@loadtest-7fbf6bcbfc-klws7 ps aux
* Requests originated: 17985
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1496 ms
50 1515 ms
75 1543 ms
90 1624 ms
95 1720 ms
99 1978 ms
100 4647 ms
tsh bench --duration=30m root@loadtest-7fbf6bcbfc-47hh9 ls
* Requests originated: 17986
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1499 ms
50 1530 ms
75 1642 ms
90 1834 ms
95 1990 ms
99 2257 ms
100 4259 ms
tsh bench --interactive --duration=30m root@loadtest-7fbf6bcbfc-6gjnv ps aux
* Requests originated: 17982
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 1527 ms
50 1558 ms
75 1682 ms
90 1897 ms
95 2035 ms
99 2249 ms
100 3143 ms
Tested out several of the AWS deployment examples. Both Terraform examples worked fine with the 8.0.0-beta.2 AMIs (tested the simple example with the OSS AMIs, and the HA example with Enterprise).
The CloudFormation example seems very broken. I'm told it hasn't been maintained in quite a while and am not sure if we care to validate it here. Some of its issues:
With a load balancer rule manually added to expose 3023, tsh login still fails:
Original Error: *trace.ConnectionProblemError Get "https://teleport.cluster.local/v2/authorities/host?load_keys=false": ssh: rejected: connect failed (Cannot open new SSH session on reverse tunnel. Are you connecting to the right port?)
(I gave up debugging here, I'm not sure if we care to get this example working again)
Desktop access: heartbeats for multiple Windows hosts discovered via LDAP all report the same host.
Issue #8846 Fixed in #8847
Desktop Access: Investigate required libraries.
Just filed https://github.com/gravitational/teleport/issues/8860 for my issues with the CloudFormation example, though given https://github.com/gravitational/teleport/issues/8665#issuecomment-961368649, I wonder if I hit a similar issue.
Aggregate last 3 releases.
Backend | Cluster Size | Mode | PTY | 6.2 | 7.0 | 8.0 |
---|---|---|---|---|---|---|
etcd | 10k | Regular | No | ~49183 ms~ | ~56383 ms~ 4475 ms | 3335 ms |
etcd | 10k | Regular | Yes | ~59423 ms~ | ~61215 ms~ 4507 ms | 4647 ms |
etcd | 10k | Tunnel | No | ~65439 ms~ | ~53759 ms~ 4451 ms | 4259 ms |
etcd | 10k | Tunnel | Yes | ~64924 ms~ | ~48223 ms~ 4435 ms | 3143 ms |
DynamoDB | 10k | Regular | No | |||
DynamoDB | 10k | Regular | Yes | |||
DynamoDB | 10k | Tunnel | No | |||
DynamoDB | 10k | Tunnel | Yes | |||
DynamoDB | 1 | Regular | No | 2471 ms | 1824 ms | |
DynamoDB | 1 | Regular | Yes | 2081 ms | 1483 ms | |
DynamoDB | 1 | Tunnel | No | 826 ms | 2125 ms | |
DynamoDB | 1 | Tunnel | Yes | 518 ms | 2002 ms |
Re: Check agent forwarding is correct based on role and proxy mode.
off
, proxy
, node
.rec mode | fwd allowed by role | fwd not allowed by role |
---|---|---|
off | allowed | disallowed |
proxy | allowed | conn failed |
node | allowed | disallowed |
Caveats:
I'm still not sure what Proxy Mode means in this context, but I've interpreted to to mean the session_recording
mode. Even so, I am still not sure what the correct Agent Forwarding behaviour is for the different recording modes, or even how they should be expected to affect the Agent Forwarding modes.
Fully aware that I may have been testing the wrong thing, I considered something that looks like this to be "allowed":
..and something that looks like this was considered "disallowed":
ALPN Proxy + Reverse Tunnel fails when ACME is used:
Issue https://github.com/gravitational/teleport/issues/8665#issuecomment-961368649 Fixed in https://github.com/gravitational/teleport/pull/8869
tsh bench --duration=30m root@ip-172-31-11-250-us-west-2-compute-internal ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 152 ms
50 166 ms
75 181 ms
90 194 ms
95 208 ms
99 283 ms
100 2125 ms
tsh bench --interactive --duration=30m root@ip-172-31-11-250-us-west-2-compute-internal ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 159 ms
50 169 ms
75 179 ms
90 192 ms
95 207 ms
99 284 ms
100 2002 ms
tsh bench --duration=30m root@ip-172-31-11-250-us-west-2-compute-internal ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 145 ms
50 154 ms
75 164 ms
90 176 ms
95 188 ms
99 237 ms
100 1824 ms
tsh bench --interactive --duration=30m root@ip-172-31-11-250-us-west-2-compute-internal ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 150 ms
50 161 ms
75 174 ms
90 186 ms
95 195 ms
99 238 ms
100 1483 ms
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.
[x] Adding nodes to a cluster @atburke
[x] Labels @atburke
[x] Trusted Clusters @atburke
[x] RBAC @codingllama
Make sure that invalid and valid attempts are reflected in audit log.
[x] Verify that custom PAM environment variables are available as expected. @xacrimon
[x] Users @nklaassen With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedtsh mfa add
[x] Backends @Joerger
[x] Session Recording @codingllama
[ ] Audit Log @quinqu
Node/Proxy ID may be found at
/var/lib/teleport/host_uuid
in the corresponding machine.Node IDs may also be queried via
tctl nodes ls
.scp
commands are recordedSubsystem testing may be achieved using both Recording Proxy mode and OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node running OpenSSH/sshd, you may use the following command to trigger a subsystem audit log:[x] Interact with a cluster using
tsh
@tcscThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
[x] Interact with a cluster using
ssh
@quinqu Make sure to test both recording and regular proxy modes.[x] Interact with a cluster using the Web UI @tcsc
User accounting @xacrimon
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @tcsc
For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @smallinsky
Teleport with multiple Kubernetes clusters @r0mant
Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columTeleport with FIPS mode @russjones
ACME @Joerger
Migrations @r0mant @russjones
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers @benarent
Teleport Plugins @Joerger
WEB UI @kimlisa @rudream @gzdunek
Main
For main, test with a role that has access to all resources.
Top Nav
Side Nav
>
, and expand has iconv
Servers aka Nodes
Add Server
button renders dialogue set toAutomatically
viewRegenerate Script
regenerates token value in the bash commandManually
tab renders manual stepsAutomatically
tab renders bash commandApplications
Add Application
button renders dialogueGenerate Script
, bash command is renderedRegenerate
button regenerates token value in bash commandDatabases
Add Database
button renders dialogue for manual instructions:Step 4
changesStep 5
commandsActive Sessions
Audit log
Session Ended
event icon, takes user to session playerdetails
buttonUsers
Auth Connectors
Auth Connectors Card Icons
Roles
Managed Clusters
Help & Support
Access Requests
Creating Access Requests
allow-roles
). This role allows you to see the Role screen and ssh into all nodes.allow-users
). This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.default
)default
assignedallow-roles
andallow-users
are listedViewing & Approving/Denying Requests
Create a user with the role
reviewer
that allows you to review all requests, and delete them.default
if thresholds weren't defined in role, or blank if not named)Assuming Approved Requests
allow-roles
allows you to see roles screen and ssh into nodesallow-roles
, verify that assumingallow-users
allows you to see users screen, and denies access to nodesswitching back
goes back to your default static roledefault
, and requests that are not expired and are approved are assumable againAccess Request Waiting Room
Strategy Reason
Create the following role:
request_prompt
settingsend request
, pending dialogue rendersStrategy Always
With the previous role you created from
Strategy Reason
, changerequest_access
toalways
:Logout
and clicking goes back to the login screenStrategy Optional
With the previous role you created from
Strategy Reason
, changerequest_access
tooptional
:Terminal
Node List Tab
Session Tab
$ sudo apt-get install mc
$ mc
Session Player
Invite Form
Login Form
Multi-factor Authentication (mfa)
Create/modify
teleport.yaml
and set the following authentication settings underauth_service
MFA invite, login, password reset, change password
second_factor
type toon
and verify that mfa is required (no optionnone
in dropdown)MFA require auth
Go to
Account Settings
>Two-Factor Devices
and register a new deviceUsing the same user as above:
MFA Management
second_factor
set tooff
disables adding devicesCloud
Invite/Reset
Recovery Code Management
Recovery Flow: Add new mfa device
Recovery Flow: Change password
Recovery Email
RBAC
Create a role, with no
allow.rules
defined:Add Server, Application, Databases, Kubernetes
button in each respective viewNodes
,Apps
,Databases
, andKubernetes
are listed underoptions
button inManage Clusters
Note: User has read/create access_request access to their own requests, despite resource settings
Add the following under
spec.allow.rules
to enable read access to the audit log:Audit Log
andSession Recordings
is accessibleAdd the following to enable read access to recorded sessions
Add the following to enable read access to the roles
Add the following to enable read access to the auth connectors
Add the following to enable read access to users
Add the following to enable read access to trusted clusters
Performance/Soak Test @fspmarshall @rosstimothy
Using
tsh bench
tool, perform the soak tests and benchmark tests on the following configurations:Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB
Soak Tests
Run 4hour soak test with a mix of interactive/non-interactive sessions:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Breaking load tests
Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive and non interactive tsh bench loads.
Teleport with Cloud Providers
AWS @timothyb89
GCP @xacrimon
IBM @xacrimon
Application Access @r0mant @smallinsky
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
.tsh aws
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
dialogue works (refresh app screen to see it registered)Database Access @r0mant @smallinsky
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.db_names
.db.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
Desktop Access @zmb3
windows_desktop_service
s to the same Teleport cluster and connect to desktops on different AD domainsabcdefghijklmnopqrstuvwxyz1234567890-=!@#$%^&*()_+[]\{}|;':",./<>?
~`, backspace, return and moving a cursor around with arrow keys works as expected with all supported browsers. Update: all other keys should now work. A known bug is that F11 (show desktop shortcut) doesn't work if you're on MacOS, as the OS seems to capture it and doesn't forward it to the browser. A useful tool for testing which keys are being registered on a windows machine is: https://dennisbabkin.com/kbdkeyinfo/TLS Routing @r0mant
v2
configuration starts only a single listener.multiplex
modeauth_service.proxy_listener_mode: "multiplex"
web_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
modetsh db proxy
with a GUI client.multiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh %r@%h:%p" user@host.example.com
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --cluster=leaf-cluster %r@%h:%p" user@node.foo.com
tsh ssh
access through proxy running in multiplex modemultiplex
mode