Closed russjones closed 3 years ago
When adding an OTP device with tsh mfa add
and try to enter the code, teleport says the code must be 6 digits long and my input surely is. It still wont be accepted. Terminal output:
Choose device type [TOTP, U2F]: TOTP
Enter device name: tempdevice
Enter an OTP code from a *registered* device: 628304
Open your TOTP app and create a new manual entry with these fields:
URL: <omitted>
Account name: <omitted>
Secret key: <omitted>
Issuer: <omitted>
Algorithm: SHA1
Number of digits: 6
Period: 30s
Once created, enter an OTP code generated by the app: 624072
TOTP code must be exactly 6 digits long, try again
Once created, enter an OTP code generated by the app: 624072
TOTP code must be exactly 6 digits long, try again
Once created, enter an OTP code generated by the app: 910046
TOTP code must be exactly 6 digits long, try again
Once created, enter an OTP code generated by the app: 426970
TOTP code must be exactly 6 digits long, try again
Once created, enter an OTP code generated by the app:
@quinqu could you please file a bug for this and assign to me? It's likely I introduced the problem in 6.2
Updating a user with tctl create -f user.yaml
breaks the audit log
and session recordings
tabs in the Web UI - #6935
@webvictim - I've added a test matrix for the tsh
tests here so we don't stomp on each other. Or on ourselves. Feel free to edit as necessary.
New | New (No Rec) | Upgraded | Upgraded (No Rec) | |
---|---|---|---|---|
PASS | PASS | PASS | PASS | tsh ssh \<regular-node> |
PASS | PASS | PASS | PASS | tsh ssh \<node-remote-cluster> |
PASS | PASS | PASS | PASS | tsh ssh -A \<regular-node> |
PASS | PASS | PASS | PASS | tsh ssh -A \<node-remote-cluster> |
PASS | PASS | PASS | PASS | tsh ssh \<regular-node> ls |
PASS | PASS | PASS | PASS | tsh ssh \<node-remote-cluster> ls |
PASS | PASS | PASS | PASS | tsh join \<regular-node> |
PASS | PASS | PASS | PASS | tsh join \<node-remote-cluster> |
PASS | *PASS | PASS | *PASS | tsh play \<regular-node> |
PASS | *PASS | PASS | *PASS | tsh play \<node-remote-cluster> |
PASS | PASS | PASS | PASS | tsh scp \<regular-node> |
PASS | PASS | PASS | PASS | tsh scp \<node-remote-cluster> |
PASS | PASS | PASS | PASS | tsh ssh -L \<regular-node> |
PASS | PASS | PASS | PASS | tsh ssh -L \<node-remote-cluster> |
PASS | PASS | PASS | PASS | tsh ls |
PASS | PASS | PASS | PASS | tsh clusters |
ERROR: 0 not found
, which I assume is the correct behaviour when recording is disabled Encountered #6938 while testing: Panic when using tctl with remote auth server
mfa related bug, where scp upload/download does not work in the web ui: https://github.com/gravitational/teleport/issues/6939
@Joerger @xacrimon Seeing https://github.com/gravitational/teleport/issues/6935 as well which Brian reported above.
@xacrimon Looks like this file (dynamic.go) was a part of your RFD19 implementation, could this have caused it? Just need to add user.updated event to the switch probably.
@r0mant Resolved in #6949 and #6950 backport to v6.
Changes introduced in #6731 break compatibility with older 6.X
instances due to reliance on new GRPC methods (e.g. attempting to view audit events from UI of a 6.2
proxy results in unknown method GetEvents for service proto.AuthService
error when dealing with a 6.1
auth server).
Teleport should fallback to using old event API if new one is not available.
cc: @xacrimon @kimlisa
@fspmarshall So this is a bit of an issue. The old events API does not support pagination but the IAuditLog
interface expects it. Should we just ignore the new parameters introduced in RFD 19 and pretend pagination doesn't exist on fallback?
ui switchback bug (i am fixing): https://github.com/gravitational/teleport/issues/6960 @xacrimon related to #6935, unknown event bug: https://github.com/gravitational/teleport/issues/6959
Should we just ignore the new parameters introduced in RFD 19 and pretend pagination doesn't exist on fallback?
@xacrimon Followed up in PR. Basically, I think we should pretend it doesn't exist when dealing with the first call (since that means we're getting the "first page", which is what the old API did), but we should return an error if startKey != ""
, since that means we're loading a subsequent page, which the old API can't do.
@xacrimon @webvictim @fspmarshall @quinqu let me know if you're overloaded. Some other folks are done with their testing so I could re-distribute remaining tasks if needed.
@awly i could use some help on the U2F second factor tests as i do not have a U2F device.
@quinqu will do :+1:
FYI everyone, if you find an issue while testing, please file a bug and put it into 6.2 milestone. That way I can track all the remaining work and questions.
I have previously assumed DynamoDB tests were running but they have not been. I still need to hook these up and run them before I can say everything is correct. I will make another comment but please do not cut before I confirm that everything is indeed working @awly. @russjones I've also merged the API compat PR. #6990 will need to be merged as well, I will ping for reviews when it is ready.
Ran into some weird tsh logout
behaviour, detailed in https://github.com/gravitational/teleport/issues/6992
Not sure if this is a blocker but I can't log out of all my clusters for some reason.
Okay. I have pinged reviews on #6990 and I sign off on everything working when it is merged. I’ve manually done some testing to make sure it works.
Most Kubernetes tests are finished, just waiting on #6990 merge/backport (and rc.2 cut?) to verify the audit log entries:
All issues are either resolved or not caused by 6.2. Marking the testplan as done.
From @fspmarshall
tsh bench --duration=30m root@loadtest-665c98bfb5-72w58 ls
* Requests originated: 17920
* Requests failed: 258
* Last error: connection closed
Histogram
Percentile Response Duration
---------- -----------------
25 4867 ms
50 6943 ms
75 9583 ms
90 14951 ms
95 20959 ms
99 40799 ms
100 65439 ms
tsh bench --interactive --duration=30m root@loadtest-665c98bfb5-9wk2b ps aux
* Requests originated: 17905
* Requests failed: 253
* Last error: connection error: desc = "transport: authentication handshake failed: EOF"
Histogram
Percentile Response Duration
---------- -----------------
25 4923 ms
50 7079 ms
75 9727 ms
90 15015 ms
95 20783 ms
99 41951 ms
100 64927 ms
tsh bench --duration=30m root@loadtest-665c98bfb5-qcf82 ls
* Requests originated: 17983
* Requests failed: 23
* Last error: connection error: desc = "transport: authentication handshake failed: EOF"
Histogram
Percentile Response Duration
---------- -----------------
25 4719 ms
50 6567 ms
75 8703 ms
90 11143 ms
95 13439 ms
99 21263 ms
100 49183 ms
tsh bench --interactive --duration=30m root@loadtest-665c98bfb5-zfsrb ps aux
* Requests originated: 17970
* Requests failed: 17
* Last error: connection error: desc = "transport: authentication handshake failed: EOF"
Histogram
Percentile Response Duration
---------- -----------------
25 4655 ms
50 6391 ms
75 8327 ms
90 10703 ms
95 13079 ms
99 21759 ms
100 59423 ms
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport. These tests should be run on both a fresh install of the version to be released as well as an upgrade of the previous version of Teleport.
[x] Adding nodes to a cluster @webvictim @tcsc
[x] Trusted Clusters @nklaassen @awly
[x] RBAC @Joerger @andrejtokarcik
Make sure that invalid and valid attempts are reflected in audit log.
[x] Users @fspmarshall @quinqu With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedtsh mfa add
[x] Audit Log @r0mant @xacrimon
scp
commands are recorded[x] Interact with a cluster using
tsh
@webvictim @tcscThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
[x] Interact with a cluster using
ssh
@nklaassen @awly Make sure to test both recording and regular proxy modes.[x] Interact with a cluster using the Web UI @Joerger @andrejtokarcik
Combinations @fspmarshall @quinqu
For some manual testing, many combinations need to be tested. For example, for interactive sessions the 12 combinations are below.
Teleport with multiple Kubernetes clusters @xacrimon @webvictim
Note: you can use GKE or EKS or minikube to run Kubernetes clusters. Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columHelm charts
teleport-cluster
Helm chart to an EKS cluster in HA mode by following the AWS guidetctl users add
tsh login
tsh kube ls
, log in withtsh kube login
kubectl get nodes
andkubectl -n kube-system get pods
teleport-cluster
Helm chart to a GKE cluster in HA mode by following the GKE guidetctl users add
tsh login
tsh kube ls
, log in withtsh kube login
kubectl get nodes
andkubectl -n kube-system get pods
teleport-kube-agent
Helm chart to an EKS cluster following instructions in the READMEtsh kube ls
, log in withtsh kube login
kubectl get nodes
andkubectl get pods
, verify no errorsteleport-kube-agent
Helm chart to a GKE cluster following instructions in the READMEtsh kube ls
, log in withtsh kube login
kubectl get nodes
andkubectl get pods
, verify no errorsMigrations @tcsc @nklaassen
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport Plugins @awly @Joerger
WEB UI @kimlisa @alex-kovoy
Main
For main, test with admin role that has access to all resources.
Top Nav
Side Nav
>
, and expand has iconv
Servers aka Nodes
Add Server
button renders dialogue set toAutomatically
viewRegenerate Script
regenerates token value in the bash commandManually
tab renders manual stepsAutomatically
tab renders bash commandApplications
Add Application
button renders dialogueGenerate Script
, bash command is renderedRegenerate
button regenerates token value in bash commandDatabases
Add Database
button renders dialogue for manual instructions:Active Sessions
Audit log
Session Ended
event icon, takes user to session playerdetails
buttonUsers
Auth Connectors
Auth Connectors Card Icons
Roles
Managed Clusters
Help & Support
Access Requests
Creating Access Rquests
allow-roles
). This role allows you to see the Role screen and ssh into all nodes.allow-users
). This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.default
)default
assignedallow-roles
andallow-users
are listedViewing & Approving/Denying Requests
Create a user with the role
reviewer
that allows you to review all requests, and delete them.default
if thresholds weren't defined in role, or blank if not named)Assuming Approved Requests
allow-roles
allows you to see roles screen and ssh into nodesallow-roles
, verify that assumingallow-users
allows you to see users screen, and denies access to nodesswitching back
goes back to your default static roledefault
, and requests that are not expired and are approved are assumable againAccess Request Waiting Room
Strategy Reason
Create the following role:
request_prompt
settingsend request
, pending dialogue rendersStrategy Always
With the previous role you created from
Strategy Reason
, changerequest_access
toalways
:Strategy Optional
With the previous role you created from
Strategy Reason
, changerequest_access
tooptional
:Switch Back
and clicking goes back to the login screenAccount
Terminal
Node List Tab
Session Tab
$ sudo apt-get install mc
$ mc
Session Player
Invite Form
Login Form
Multi-factor Authentication (mfa)
Create/modify
teleport.yaml
and set the following authentication settings underauth_service
MFA create, login, password reset
totp
(TODO: temporary hack, ideally want to allow user to select)otp
otp
MFA require auth
Through the CLI,
tsh login
and register a u2f key withtsh mfa add
(not supported in UI yet).Using the same user as above:
RBAC
Create a role, with no
allow.rules
defined:Add Server
button in Server viewAdd Application
button in Applications viewNodes
andApps
are listed underoptions
button inManage Clusters
Note: User has read/create access_request access to their own requests, despite resource settings
Add the following under
spec.allow.rules
to enable read access to the audit log:Audit Log
andSession Recordings
is accessibleAdd the following to enable read access to recorded sessions
Add the following to enable read access to the roles
Add the following to enable read access to the auth connectors
Add the following to enable read access to users
Add the following to enable read access to trusted clusters
Performance/Soak Test @xacrimon @fspmarshall
Using
tsh bench
tool, perform the soak tests and benchmark tests on the following configurations:Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB
Soak Tests
Run 4hour soak test with a mix of interactive/non-interactive sessions:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Breaking load tests
Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive and non interactive tsh bench loads.
Application Access @r0mant @smallinsky
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
.Add Application
dialogue works (refresh app screen to see it registered)Database Access @r0mant @smallinsky
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels