juanfont / headscale

An open source, self-hosted implementation of the Tailscale control server
BSD 3-Clause "New" or "Revised" License
22.27k stars 1.24k forks source link

Enhance OIDC Authentication with refresh_token Support #1531

Open vsychov opened 1 year ago

vsychov commented 1 year ago

Hello,

First and foremost, I'd like to express my appreciation for the excellent product you have built.

Currently, headscale offers OIDC authorization, but its functionality appears to be somewhat limited. As it stands, it's challenging to provide both a smooth user experience and maintain security due to headscale's inability to utilize refresh_token. This limitation presents us with a dilemma: either we grant the user an extended lifespan, during which their access might be revoked by IdAM, or we ask the user to re-authenticate periodically (e.g., daily).

A feasible solution would be to use both access and refresh tokens. Implementing this would primarily involve storing the refresh_token in the database and periodically updating the existing access_token. It's worth noting that the current OIDC implementation (from coreos/go-oidc) supports renewing the access_token via the refresh_token, as demonstrated here. Should an attempt to refresh the access_token fail, nodes authenticated by it should be logged out (set expire to now).

@juanfont , @kradalby I'm curious to hear your thoughts on this proposal. If this aligns with the project's direction, I would be more than happy to submit a pull request with the necessary modifications.

Thank you for your time and consideration.

almereyda commented 1 year ago

PRs are set to wait until #1473 is resolved. There it was announced end of May that no contributions are taken for the time being.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 90 days with no activity.

vsychov commented 9 months ago

/remove-lifecycle stale

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

vsychov commented 6 months ago

/remove-lifecycle stale

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 90 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

mitchellkellett commented 1 month ago

@kradalby did this ever get resolved? It looks like it went stale a few times.

I’m having a similar issue where it’s not using the refresh token. Not sure if it should be to start with.

baznikin commented 3 weeks ago

/remove-lifecycle stale

baznikin commented 3 weeks ago

Please, implement refresh_token support

kradalby commented 3 days ago

I've tried to read up on this and understand the usecase, and how the oauth/oidc part plays in and I am still a bit unsure about somethings, so I will lay it out how I understand it and I need someone to elaborate on the things I have misunderstood and usecases.

First I want to establish that from what I understand, Tailscale SaaS does not implement neither using the access token nor refresh tokens for expiry. They use their own key expiry time, which we also do as default.

In addition, headscale has a config option for making the expiry being set to the access token's expiry time.

About OIDC (or I suppose OAuth2 technically), access tokens are a part of OAuth2 to let authorise access to a service, typically from a program or another service. In headscale, access tokens are not used for anything.

OIDC is about Authentication, which is what headscale does, you log in with OIDC, we take notice that you were allowed to login and then we mark the node as authenticated. This part does not really have any expiry, you log in, your approved, and headscale chooses the expiry.

Refresh tokens are an additional (optional) layer of security to OAuth2 to mitigate some attacks. It work by issuing a token with a longer expiry, the refresh token, which is authorised to issue new, short lived access tokens.

Two examples:

OIDC provider 1 is using refresh tokens, when a user logs inn, they are issued a refresh token which is valid for 2 days. The user (or the program using the token) can use this key to create access tokens which are valid for 5 minutes. When the 2 days are up, the user will have to log in again.

OIDC provider 2 only uses access tokens, they have a validity of 1 day, and after 1 day the user will have to log in again.

as mentioned, headscale currently has a flag to make a nodes authentication use the expiry time of the access token, that means that with provider 1, the node will expire after 5 minutes, while with provider 2, it will expire after 1 day.

I think an important thing to notice here is that headscale does not need the OAuth/authorisation part of this. The access token does not play into how it works, and the only meaningful use of the protocol is to authenticate the user to see if it can access the service.

It sounds misguided to me that a third party system (the OIDC provider) should dictate the expiry time of a node logged into by a user. It sounds more logical to me that the operator of headscale would set the desired expiry time in headscale to achieve the security configuration that they are after.

There was a big discussion for when access tokens was addressed in https://github.com/juanfont/headscale/issues/1176. I think using access token's expiry is still wrong, but I do not plan to remove it.

mitchellkellett commented 2 days ago

Thanks for taking a look Kristoffer.

It sounds more logical to me that the operator of headscale would set the desired expiry time in headscale to achieve the security configuration that they are after.

I think you're right with saying this, though my thoughts regarding the refresh token are that when the node expires, Headscale will automatically try and refresh the nodes expiry by using the refresh token. If the user's account is disabled/deleted, then Headscale won't be able to.

Currently, when a node expires, the user has to go back through the OIDC flow to reauthenticate the node, if we can use the refresh token, then this step would happen automatically and only require intervention when it can't use the refresh token automatically.

My ideal outcome would be that I can set the Headscale nodes to expire daily and then let refresh tokens run their course. That way that if a user is disabled/deleted in my IdP, then the next day their nodes are expired.

kradalby commented 2 days ago

If the user's account is disabled/deleted, then Headscale won't be able to.

It wont be instantly reactive no, but you could solve this by using the API to expire/delete the node.

if we can use the refresh token, then this step would happen automatically and only require intervention when it can't use the refresh token automatically

While I dont disagree that it sounds like a nice experience, it sounds problematic for a couple of reasons,

Since headscale is software that is very much in the security path, and has a very small maintainer team, keeping the auth path as simple as possible is desirable, any bugs in this path will put us in a very bad position. Implementing something like this does not seem worth the effort and risk. I dont think Tailscale SaaS implements this either, telling me that if the software targeting business dont do it, then its not a good thing to invest time in.

If you want users to log in often, then you should do a short expiry window, and make them reauth with oidc. If there is a risk that your users might have been deleted in the meantime, use the API to purge them.

The access token support is there, it will not be remove, but I would not recommend using it, and I'll probably mark it as such.

vsychov commented 8 hours ago

While I dont disagree that it sounds like a nice experience, it sounds problematic for a couple of reasons,

  • You will need to store all the refresh tokens/sessions in headscale, which does not sound ideal, if headscale gets compromised someone has refresh tokens and can issue access tokens for your entire org.
  • You will need significantly more complicated machinery for handling the renewal and making sure it is correct
  • Potentially very hard to have automatic tests and by proxy make it hard to maintain.

Since headscale is software that is very much in the security path, and has a very small maintainer team, keeping the auth path as simple as possible is desirable, any bugs in this path will put us in a very bad position. Implementing something like this does not seem worth the effort and risk. I dont think Tailscale SaaS implements this either, telling me that if the software targeting business dont do it, then its not a good thing to invest time in.

If you want users to log in often, then you should do a short expiry window, and make them reauth with oidc. If there is a risk that your users might have been deleted in the meantime, use the API to purge them.

The access token support is there, it will not be remove, but I would not recommend using it, and I'll probably mark it as such.

Access tokens are a well-established technology that, when configured and managed correctly, can significantly enhance both security and user experience.

What risks do you see in someone stealing refresh tokens? The access token that the user receives doesn’t have any real privileges. So, using the refresh token doesn’t give the attacker any advantage. Essentially, the access token is only needed for headscale to verify that the user is who they claim to be. The refresh token, typically single-use, just enables automatic renewal of the access token. It seems that there’s no significant security risk if token permissions are set correctly.

As for storing and refreshing tokens, it doesn’t seem like a particularly complex task. A single database table could store the tokens, and a background goroutine could refresh them at regular intervals. Additionally, the libraries that headscale uses for OIDC already support this mechanism.

Regarding automated testing, it also seems manageable. There are already existing tests for this functionality, and they would only need to be slightly expanded rather than completely rewritten.

I’d love to hear your thoughts on this!