cluster global client_idle_timeout overrides less restrictive role-specific client_idle_timeout

deusxanima commented 3 years ago

Description

What happened:

Customer has database analysts that create a long-lived session so they can jump & port-forward to an rds instance. Customer configured cluster global client_idle_timeout at 30m and analyst-role-specific client_idle_timeout at 8h. Analysts are reporting that their sessions are terminating and timing out after 30m vs. honoring the role-specific client_idle_timeout settings. Analysts only get assigned one role so there is no issue of conflicting roles w/ more restrictive settings.

What you expected to happen:

Expectation is that the role-specific client_idle_timeout settings would take precedence over the cluster global client_idle_timeout settings so long as there were no other roles with more restrictive settings in place.

How to reproduce it (as minimally and precisely as possible):

Root Cluster Global Proxy config:

client_idle_timeout: 1m
keep_alive_interval: 1m
keep_alive_count_max: 3

Role on Root Cluster:

client_idle_timeout: 10m
    enhanced_recording:
    - command
    - network
forward_agent: false
max_session_ttl: 12h0m0s
port_forwarding: true

Run Following Command:

tsh ssh -L 5001:localhost:3080 root-jumphost

Observe: session will timeout after 1m of idle vs staying open for 10m

Environment

Teleport version: Enterprise 4.4.0
Tsh version: Enterprise 4.4.0
OS: Ubuntu 18.04.5 LTS
Where are you running Teleport? (e.g. AWS, GCP, Dedicated Hardware): aws

Relevant Debug Logs If Applicable

Dec 03 22:28:32 ip-172-31-72-82.ec2.internal /usr/local/bin/teleport[3259]: DEBU [NODE]      Disconnecting client: client is idle for 1m0.000242869s, exceeded idle timeout of 1m0s id:3 idle:1m0s local:172.31.72.82:3022 login:root remote:45.37.202.230:50394 teleportUser:alen srv/monitor.go:197
Dec 03 22:28:32 ip-172-31-72-82.ec2.internal /usr/local/bin/teleport[3259]: INFO [AUDIT]     client.disconnect addr.local:172.31.72.82:3022 addr.remote:45.37.202.230:50394 code:T3006I ei:0 event:client.disconnect login:root reason:client is idle for 1m0.000242869s, exceeded idle timeout of 1m0s server_id:3f95eddf-48b7-4192-9f57-5a91aa700619 time:2020-12-03T22:28:32.796Z uid:e2732ec7-2525-41d5-9468-c3fa868b75ac user:alen events/emitter.go:318
Dec 03 22:28:32 ip-172-31-72-82.ec2.internal /usr/local/bin/teleport[3259]: DEBU [SSH:NODE]  Closed connection 45.37.202.230:50394. sshutils/server.go:440

benarent commented 3 years ago

I spend a bit more time after our meeting to review the Zendesk issue and this ticket, and I can now see how 'overrides the global cluster setting' is confusing. As per @awly we should always keep global settings as the limit and restrict it further with roles. For this customer the global client_idle_timeout should be the max. e.g. 8hrs for analysts and other users should have a shorter RBAC rule set to 30m.

I'm going to open a PR to make the docs clear that the override is more of a restriction / shortening of the global setting.

# client_idle_timeout determines if SSH sessions to cluster nodes are forcefully
- # terminated after no activity from a client (idle client). it overrides the
+ # terminated after no activity from a client (idle client). it can shorten the
# global cluster setting. examples: "30m", "1h" or "1h30m"

deusxanima commented 3 years ago

Thanks for clarifying and confirming @benarent

gravitational / teleport

cluster global client_idle_timeout overrides less restrictive role-specific client_idle_timeout #5048

Description

Environment