juanfont / headscale

An open source, self-hosted implementation of the Tailscale control server
BSD 3-Clause "New" or "Revised" License
20.87k stars 1.15k forks source link

When already-expired node is set to "Never Expire" (expiry is NULL), it does not go back to logged-in status. #1851

Open benmehlman opened 3 months ago

benmehlman commented 3 months ago

Bug description

Using Tailscale's control server: when a node expires, it remains connected to the control server, although it no longer passes tailnet traffic. The node can be restored to operation by selecting "Disable key expiry" in the Tailscale admin UI. It will start passing traffic again, without having to re-authenticate or take any other action on the node machine itself.

This does not work on headscale.

Environment

The reverse proxy seems to be working fine as everything else related to node-server communication is working perfectly and it's been very stable.

To Reproduce

These steps assume OIDC is in use...

  1. In config.yaml, set oidc expiry to a short time so that expiration can be easily observed (eg. "5m"), and restart the service,
  2. Run "tailscale up" on the node with the appropriate parameters to connect to the headscale instance.
  3. Complete OIDC login.
  4. Observe that the node is connected to the tailnet as normal.

On the headscale server:

  1. Wait for the node to expire.
  2. Observe that headscale nodes list indicates that the node is connected but expired, as expected.
  3. Test the node connectivity to confirm that it has stopped passing traffic as expected.
  4. Set the node to "Disable key expiry" by using sqlite3 to execute: UPDATE node SET expiry = NULL WHERE id = the_node_id;
  5. Observe that headscale nodes list indicates that the node is "online" and expired is "no".
  6. Observe that, even after some time is allowed for polling (if necessary), the node does not resume passing traffic, and tailscale status on the node remains "Logged out".

Logs and attachments

netmap_recover_after_expiry.json

kradalby commented 2 months ago

So this will not really work since changing the database will not trigger any of the mechanisms that update the clients. I would think that if you change the database and restarts headscale it might work.

I think essentially what we need is a new command set-expiry or something which sets a new expiry and the nodes are appropriately updated.

I'm going to remove this from 0.23.0, it is important, but it is not a regression at should be tackled after.

benmehlman commented 2 months ago

I did try restarting headscale, it didn't cause the node to come back online.. so, there is some other detail that is not quite right.

May I suggest that rather than a separate api for set-expiry, rather implement PATCH so that as new columns are added in the future it would be easy to add them to the API without adding more endpoints?

Also I suggest adding a separate boolean column for "never_expire". This removes the ambiguity when a node which has never authenticated has an expiry = null.