gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.42k stars 1.74k forks source link

Visibility into nodes with incorrect system clocks #22166

Open nklaassen opened 1 year ago

nklaassen commented 1 year ago

What would you like Teleport to do?

When access to a Teleport node is denied because the system clock on the node is not correct, a descriptive error should be printed.

When the system clock of a Teleport node is > 1 minute off, a log or possibly an alert should be created on the auth server to provide some advanced warning of this issue.

What problem does this solve?

Currently, when the system clock of a Teleport node is "off" (either fast or slow) it can break access to that node due to certificates being rejected. The node may think that the cert is not yet valid, or already expired, when the real issue is that the clock is incorrect.

This is very tricky to debug without accessing the node. Currently the only user-visible error is a generic access denied to <user> connecting to <host>, there is no good way to find the root cause without access to the actual node logs, and this is very difficult when access is broken!

If a workaround exists, please include it.

Correctly set the system clock on all nodes, keep them synced over ntp or similar, create a monitoring system to make sure clocks do not drift

jmarler commented 7 months ago

We need more than just to be able to report that time is incorrect. When the time gets off, we lose access to ssh via teleport, which means we can't get in to fix it. Unless there is some workaround available that would allow us to use a different SSH key to authenticate to the host, we have to go track down the device, obtain physical access, and correct it from there if we can't poke a port forward through the firewall.

hugoShaka commented 6 months ago

We had another S1 today with a customer with a VM whose time drifted.

We need more than just to be able to report that time is incorrect.

In my opinion, ensuring the machines have a time-sync mechanism is the user's responsibility. This is especially true for machines that are hard to recover physically or that are managed by third parties.

Teleport should not mess with the machine time sync, as most users have working daemons that do time sync properly and already use the most relevant clock (e.g., cloud-specific NTP servers, on-prem clocks, ...). We should not mess with their NTP configuration or fight against the NTP daemon for clock ownership.

jmarler commented 5 months ago

I'm not saying that teleport should directly mess with changing anything on the host. I agree with you 100% on that. What I am saying is that teleport needs to be able to adapt to this reality and give us a way to address the problem and enable access, even when the time is askew. If Teleport is saying that it provides SSH access to correct issues on servers, except if the time is askew, it leaves a big gap. This also opens up a novel approach to denying service to teleport as an attack. If you can force the time slightly askew, you may be able to significantly impact response time to address the issue, and cause other outages. Many of us use teleport as a means to connect to a server to correct issues, but teleport can't help us when this issue occurs, and that is what we are asking for a creative solution for.