juanfont / headscale

An open source, self-hosted implementation of the Tailscale control server
BSD 3-Clause "New" or "Revised" License
23.72k stars 1.3k forks source link

Harden OIDC migration and make optional #2170

Closed kradalby closed 6 days ago

kradalby commented 1 month ago

This commit hardens the migration part of the OIDC from the old username based approach to the new sub based approach and makes it possible for the operator to opt out entirely.

Fixes #1990

kradalby commented 1 month ago

Followup on @micolous comment on https://github.com/juanfont/headscale/pull/2020#issuecomment-2393061100

kradalby commented 1 month ago

@micolous I highly appreciate that you have spent time helping me research and get this right, I need some time before I can sit down properly, but I will try to start researching and think about solutions, I've answered some of the parts where I have some ideas already.

micolous commented 1 month ago

I've been poking at this in another branch, based on this PR: https://github.com/micolous/headscale/tree/oidc-takeover-suggestions:

Consider that a "work in progress", feel free to incorporate things into this PR. 😄

I've now realised a few more problems which make some of my suggestions (including in that branch) infeasible:

I think the auto-migration will have to be:

  1. Reintroduce the strip_email_domain setting, with the same logic as before #2020.
  2. If map_legacy_users: false (the default), panic if the strip_email_domain option is set explicitly (we don't want to use that if not mapping!)
  3. new check: Require allowed_domains contain exactly 1 entry if strip_email_domain: true (the best we can do to lock this down)
  4. new check: Require the email_verified OIDC claim to be true (it's not much...)
  5. If map_legacy_users: true and there's no account with the sub, attempt to get user by username in the old way (taking into account strip_email_domain and ignoring accounts where sub is already recorded).
  6. After migration, update the account to use the preferred_username

There's still ways this could fail or be exploited, but it's at least something.

kradalby commented 1 month ago

I'm replying a bit before I have managed to grasp everything as I am not well versed in these topics but I am trying to learn fast, a couple of notes and some comments:

I've been poking at this in another branch, based on this PR: micolous/headscale@oidc-takeover-suggestions

This pr and the proposed migration, with reintroducing strip_email_domain sounds like the most sensible way right now. I dont want neither the migration code, option nor strip email around for long, so I propose that when we get this sorted, we release 0.24.0 with this code, then everyone who uses OIDC should use this release to migrate. Then main will remove all this code, and 0.25.0 will contain no migration code and represent the state we want to be the new "base" for OIDC, that we then can build upon.

if an account does not have a Headscale password set

Headscale does not have any passwords, I am not sure if I am missing something from https://github.com/juanfont/headscale/pull/2170#discussion_r1791043244

I wouldn't modify the preferred_username in any way – it's intended to be used as-is.

As for policy preferences, Kanidm defaults to using an SPN that looks like an email address (eg: user@kanidm.example.com), but is probably not an actual email address. This can be changed on an application configuration level with kanidm system oauth2 prefer-short-username $client_id, which makes Kanidm return the Person's name field as a preferred_username (eg: user) 1.

That is find I think, an important goal with this work tho (eliminating strip_email_domain) is to ensure all users "DisplayName" which is used in headscale nodes list for owners, in headscale users and in the Policy file must include an @. This is for future ACL/Policy work.

I dont have any particular opinion if it should be:

Email makes the most sense to me.

kradalby commented 1 month ago

Another thought on the level of effort to put into this migration,

It is currently "quite broken" or at least insecure as we have established since it is vulnerable to these kind of email/username takeover attacks.

Based on this, I think we can make the argument that by using the current setup, you are equally at risk compared to a "inperfect" migration strategy. I feel like we can make the argument for using the Username based migration given that admins already trust what is fed into the username currently, and that should be sufficient to trust it to do the migration. With the addition that strip_email_domain has to be reintroduced so it can be matched exactly like we currently do.

It sounds to me like the ideal scenario is to get this fixed, and maybe the best is to keep the simple migration, then tell all your users to login again and disable the migration path after it has been completed.

micolous commented 1 month ago

That is find I think, an important goal with this work tho (eliminating strip_email_domain) is to ensure all users "DisplayName" which is used in headscale nodes list for owners, in headscale users and in the Policy file must include an @. This is for future ACL/Policy work.

The claim stability and uniqueness constraints apply here too. If an OIDC account is renamed or changes email address in the IdP, Headscale needs to be able to track that and continue to apply the same policies.

There's nothing stopping Headscale recording or consuming attributes like email address and display name in appropriate contexts (like user lists). However, it should not rely on those attributes to be stable, or use them to apply access controls.

What does Tailscale do?

According to Tailscale's ACL policy docs and grant docs, users can be referenced by:

I couldn't find anywhere in the docs that mentions if you can change your email address on an account, or the name of a passkey.

However, you can rename your GitHub account. I haven't tested whether someone taking over a renamed account could be used as an account take-over vector, or if that would just create a new Tailscale account (keyed by sub) and then apply the wrong ACLs/grants to anything using the old name (much like many other references in GitHub).

Tailscale's Entra SCIM sync and Okta SCIM provisioning docs suggest that Tailscale maps SSO users by email address (rather than sub).

Other parts of Tailscale's docs state that SCIM groups can be referenced by name or ID. They also state that if you rename a group referenced by name, then those references in ACLs or grants will break. This suggests that they're not internally converted to IDs (though this may not be possible, depending on how this is all provisioned).

All of this seems to suggest that Tailscale does not handle user (or group) renames over OIDC or SCIM correctly, though I don't have a Tailscale Enterprise account to verify this.

What should Headscale do instead?

Headscale should support referencing:

That way it wouldn't matter if an OIDC user's account was renamed, changed email address, or even had an email address set – the iss + sub would remain the same.

Also, Headscale would need to:

It could also convert any references to an email address in ACLs/policies to a local user ID or iss+sub, and then emit policy which only contains one of those. It may be better to use the external identifier where available, because then it may be easier to use with configuration management tools that'll be working with IdP identifiers.

Based on this, I think we can make the argument that by using the current setup, you are equally at risk compared to a "inperfect" migration strategy. I feel like we can make the argument for using the Username based migration given that admins already trust what is fed into the username currently, and that should be sufficient to trust it to do the migration.

Yeah, I don't like it either... but I don't see any other way to let you automatically migrate a pre-#2020 auth database to sub without re-introducing the same security risks that were there before.

It sounds to me like the ideal scenario is to get this fixed, and maybe the best is to keep the simple migration, then tell all your users to login again and disable the migration path after it has been completed.

It's important to clearly communicate what's happened, give administrators options for remediation, and they can make a choice of what to do next based on their appetite for risk (whether they attempt to auto-migrate, or they want to audit and patch the user database to fix the problems).

If you have a single-domain configuration (or multi-domain with strip_email_domain: false), and your IdP does not allow user-triggered email address changes, and no users have changed email address, then the bug probably doesn't affect you.

We can't know all of these things for sure, so the "do nothing" option should fail "safe" – that is, no migration at all, only using sub. That way, the bug is definitely fixed, and an administrator must take explicit action to either potentially re-introduce a vulnerable configuration, or to manually migrate users using known-safe data.

Or maybe there could be an oidc.configuration_version: 2 attribute that you need to set to be able to use OIDC in the new version, and it crashes on start-up (with a reference to the docs about this issue) if that's not set and OIDC support is enabled.

There should be some way for an admin to know which users are pending migration, and how many. That way they know when they can safely turn it off, see how remediation is tracking and who they need to follow up.

kradalby commented 1 month ago

ok, I this is how I will approach this based on the information out of hand. I will implement this will a tradeoff between usability, putting constraints on the OIDC operator and documentation:

OIDC implementation

In the Policy, email will be the identifier for a user. For non OIDC implementations, it will be username@

Migration

To get away from the current insecure implementation, we will implement the migration that works as implemented in this PR:

I deem this acceptable as a migration because it is no more insecure than the current implementation, and the trade off of getting off this implementation is more important than making it perfect.

It will be possible to opt-out of the migration, but it will be default true for one version (0.24.0), then it will be default false for 0.25.0, and removed completely in 0.26.0 (including the strip email functionality).

New columns can be added to headscale users list with Email, which will indicated if a user has been migrated so an Admin can track it.

We will document a recommendation of turning off the migration when all users are migrated.


Considering the high level of configurability of OIDC, providers etc, it is not an option for us to implement the vast level of configurability. It is not an option to sacrifice the usability of the policy, and headscale by forcing users to use the full identifier in Policy. We are therefore happy with requiring certain behaviour from the OIDC. If an OIDC cannot be configured to provide an email, then those OIDCs will not be supported.

I think this tradeoff will cover the concerns raised:

We are balancing the user impact and user experience and allow administrators to manage their own risk appetite. It balance the timeline by choosing a simpler migration that is attainable now and will start getting people over to the correct solution.

kradalby commented 1 month ago

Does this sound like it should address the concerns @micolous?

micolous commented 1 month ago

Oops, I accidentally pressed "Comment" before I was finished. I will try again soon. 😄

micolous commented 1 month ago

Comments inline, but I've re-ordered some parts to make it a little easier to follow.

OIDC implementation

  • The identifier used in the database to identify a user will be iss + sub.
  • pref username, display name, picture etc will be updated on all logins

Agreed.

  • An admin command for manually updating the Email will be implemented

This is a good idea to support non-OIDC logins.

  • We will make email required from OIDC

I couldn't find anywhere that Headscale sends emails, so it should not require an email address, except for migrations.

Non-OIDC users are already identified by username@.

  • Email will not be updated beyond the first login, this will be documented

The whole point of having an identity provider is that it is a single source of truth, and that all attributes are updated automatically in all applications, eliminating the administrative burden of maintaining identities across many applications for many users.

  • Email will have to be unique, this will be documented

This is akin to requiring email be a stable and unique claim – exactly the issue we're trying to avoid.

The fundamental issue is that a person's email address can always change. OIDC at least gives a way for an identity provider to signal to an application that it is the same user (sub) regardless, and it's up to an application to handle that correctly.

An email address could change for many reasons:

Preventing email addreesses from being updated (or making it difficult) doesn't make the issue go away – if anything, it makes things worse.

Considering the high level of configurability of OIDC, providers etc, it is not an option for us to implement the vast level of configurability. It is not an option to sacrifice the usability of the policy, and headscale by forcing users to use the full identifier in Policy. We are therefore happy with requiring certain behaviour from the OIDC.

I acknowledge that using a sub isn't the friendliest thing to use in policy – but it's also the only thing that's actually stable, which means it's also the only thing that's actually secure.

I also acknowledge that this stems from a Tailscale design flaw, but there is some latitude for Headscale to correct this issue, as it already does for the local case (eg: username@).

If non-OIDC users can be renamed, it'll have to be handled there too anyway.

Headscale could support unstable identifiers (like email or preferred_username) and translate them into stable ones based on its user database, throwing an error if none match or there is more than one match.

This would make an email or preferred_username change automatically propagate to any existing policy without the user having to be aware of it, and uses a familiar paradigm in a fairly-safe way.

Local filesystems on UNIX-like systems already do something similar – if you chown user file, file's owner will be set to user's UID at the time of execution, not their username. If user is later renamed, the UID stays the same, and the file's effective ownership will be retained. When you run ls -l file, it translates the UID back to user's new username. Only things which reference user by username (of which there are many) break.

My understanding is that Headscale doesn't even support these policies today. While it's important to acknowledge these potential later issues, this can be a later problem. 😄

Migration

To get away from the current insecure implementation, we will implement the migration that works as implemented in this PR:

  • Re-add strip_email_domain so we can correctly match the usernames mirroring the current behaviour

Sounds good.

  • If the user has already been migrated, create a new user.

Assuming that on first login for migrating users, the user's full email address is recorded in Headscale's user database (ie: fixing strip_email_domain: true); the way we know an account is already migrated is if:

If both cases are handled as "create new account":

This could be flagged for the administrator to take action.

Automatically locking both accounts could be used as a denial of service vector, if Headscale is configured to use a public identity provider which erroneously allows public logins.

That all said, creating a new user is still a reasonable answer. It just contradicts the prior stated constraints.

I deem this acceptable as a migration because it is no more insecure than the current implementation, and the trade off of getting off this implementation is more important than making it perfect.

I agree with you it's important to move forward in some way on this, and that there is no perfect solution.

It's important that an administrator can see what the automated migration process has done, any exceptional circumstances it has detected, and manually reassign things if needed.

It will be possible to opt-out of the migration, but it will be default true for one version (0.24.0), then it will be default false for 0.25.0, and removed completely in 0.26.0 (including the strip email functionality).

We will document a recommendation of turning off the migration when all users are migrated.

What happens if an administrator skips v0.24.0 or v0.25.0, and there are other critical bug/security fixes in the interim?

As much as I dislike having the option around any longer than needed, I think migration should be decoupled from point releases, and be "opt in":

"Opt in" is the only option which empowers administrators to make a choice, because "opt out" is not consent.

New columns can be added to headscale users list with Email, which will indicated if a user has been migrated so an Admin can track it.

Sounds good.

micolous commented 1 month ago

So in summary, I'd suggest something similar to what you had, with a few tweaks:

OIDC implementation

Other claims should be update Headscale's database when changed, and are not treated as unique:

Policy implementation (for later)

Migrations

Same as you had it, but only "opt in" migration (for reasons mentioned in my previous comment).


This implementation and migration strategy will:

[^1]: when auto-migration is disabled

kradalby commented 1 month ago

@micolous I've pushed the new migration, probably need some tidying but please take a look.

https://github.com/juanfont/headscale/pull/2170#issuecomment-2404643862, but only "opt in" migration (for reasons mentioned in my previous comment).

There will have to be one version with it opt out so people who dont read changelogs and then have a bunch of users created run down the issue tracker. A compromise could be a check at start up which sees if all users have oidc identity and automatically turns it off.

I will look at the Policy resolving after i get back from a coffee, it needs to go in at the same time I believe.

kradalby commented 1 month ago

@micolous Can you take a look now?

I think I have implemented the migration and info as discussed in this PR.

Then I have split "resolve" users from ACL into https://github.com/juanfont/headscale/pull/2205 and I would appreciate if you can take a look that it covers the concerns.

micolous commented 1 month ago

Thinking about things some more, enabling migration should have no effect on the security of new Headscale installations where every user has an iss and sub recorded. I like the idea of making you choose – but I can also see the support angle for this.

I still need to have a look at the code, but I've attempted to reword the change log (and ended up spending quite a long time on it). What's there ("domain is always part of the username for OIDC") is not always true, and some bits are misleading.

I think writing the change log first will make sure we agree on the goal. 😄

I suspect #2205 will block some of what goes into this changelog – but I'll focus on OIDC for now.


Security fix: OIDC changes in Headscale 0.24.0

Headscale v0.23.0 and earlier identified OIDC users by the "username" part of their email address (when strip_email_domain: true, the default) or whole email address (when strip_email_domain: false).

Depending on how Headscale and your Identity Provider (IdP) were configured, only using the email claim could allow a malicious user with an IdP account to take over another Headscale user's account, even when strip_email_domain: false.

This would also cause a user to lose access to their Headscale account if they changed their email address.

Headscale v0.24.0 now identifies OIDC users by the iss and sub claims. These are guaranteed by the OIDC specification to be stable and unique, even if a user changes email address. A well-designed IdP will typically set sub to an opaque identifier like a UUID or numeric ID, which has no relation to the user's name or email address.

This issue only affects Headscale installations which authenticate with OIDC.

Headscale v0.24.0 and later will also automatically update profile fields with OIDC data on login. This means that users can change those details in your IdP, and have it populate to Headscale automatically the next time they log in. However, this may affect the way you reference users in policies.

Migrating existing installations

Headscale v0.23.0 and earlier never recorded the iss and sub fields, so all legacy (existing) OIDC accounts from need to be migrated to be properly secured.

Headscale v0.24.0 has an automatic migration feature, which is enabled by default (map_legacy_users: true). This will be disabled by default in a future version of Headscale – any unmigrated users will get new accounts.

Headscale v0.24.0 will ignore any email claim if the IdP does not provide an email_verified claim set to true. What "verified" actually means is contextually dependent – Headscale uses it as a signal that the contents of the email claim is reasonably trustworthy.

Headscale v0.23.0 and earlier never checked the email_verified claim. This means even if an IdP explicitly indicated to Headscale that its email claim was untrustworthy, Headscale would have still accepted it.

What does automatic migration do?

When automatic migration is enabled (map_legacy_users: true), Headscale will first match an OIDC account to a Headscale account by iss and sub, and then fall back to matching OIDC users similarly to how Headscale v0.23.0 did:

On migration, Headscale will change the account's username to their preferred_username. This could break any ACLs or policies which are configured to match by username.

Like with Headscale v0.23.0 and earlier, this migration only works for users who haven't changed their email address since their last Headscale login.

A successful automated migration should otherwise be transparent to users.

Once a Headscale account has been migrated, it will be unavailable to be matched by the legacy process. An OIDC login with a matching username but non-matching iss and sub will instead get a new Headscale account.

Because of the way OIDC works, Headscale's automated migration process can only work when a user tries to log in after the update. Mass updates would require Headscale implement a protocol like SCIM, which is extremely complicated and not available in all identity providers.

Administrators could also attempt to migrate users manually by editing the database, using their own mapping rules with known-good data sources.

Legacy account migration should have no effect on new installations where all users have a recorded sub and iss.

What happens when automatic migration is disabled?

When automatic migration is disabled (map_legacy_users: false), Headscale will only try to match an OIDC account to a Headscale account by iss and sub.

If there is no match, it will get a new Headscale account – even if there was a legacy account which could have matched and migrated.

We recommend new Headscale users explicitly disable automatic migration – but it should otherwise have no effect if every account has a recorded iss and sub.

When automatic migration is disabled, the strip_email_domain setting will have no effect.

Other OIDC changes

Headscale now uses the standard OIDC claims to populate and update user information every time they log in:

Headscale profile field OIDC claim Notes / examples
email address email Only used when "email_verified": true
display name name eg: Sam Smith
username preferred_username Varies depending on IdP and configuration, eg: ssmith, ssmith@idp.example.com, \\example.com\ssmith
profile picture picture URL to a profile picture or avatar

These should show up nicely in the Tailscale client.

This will also affect the way you reference users in policies.


kradalby commented 1 month ago

Thinking about things some more, enabling migration should have no effect on the security of new Headscale installations where every user has an iss and sub recorded. I like the idea of making you choose – but I can also see the support angle for this.

That makes sense.

I still need to have a look at the code, but I've attempted to reword the change log (and ended up spending quite a long time on it). What's there ("domain is always part of the username for OIDC") is not always true, and some bits are misleading.

I think writing the change log first will make sure we agree on the goal. 😄

I've included your well-written changes, thank you, I think it covers it and what I have done, we can review and comment on that as part of this PR.

I suspect #2205 will block some of what goes into this changelog – but I'll focus on OIDC for now.

Lets merge this first, then we adopt it to #2205 as part of the process there, while we are improving that in parallel.

Thank you, and let me know how the code review goes.

jirutka commented 3 weeks ago

OIDC users by iss + sub, eg: idm.example.com/97fb4c49-c3b2-4e13-a8b9-cbe4d6525dc5@oidc

Do you plan to implement support for using multiple OIDC issuers (which practically means multiple OIDC providers) simultaneously? If not, then iss is unnecessary and just makes the user identifier impractically long (e.g. for using in ACLs).

Please let the administrator decide what they want to use as the user identifier. Now I mapped the username (centrally managed and so unique within the organisation) to sub (in Keycloak), so I can use normal user identification in the ACLs (just to find out that it still uses email, but I assume that it’s not the final design). It would be even better to make the claim used as the user identifier configurable (for admins who don’t control their IdP). It’s the admin’s responsibility to choose an appropriate identifier (which depends on the organisation’s policy) and make sure that it’s unique and immutable.

BTW, why have an email address in the first place? And why email verification? Is it just for the migration process from the previous OIDC implementation? Headscale just needs some user identifier. Maybe the user (full) name (in a free form) for a nicer UI. Collecting and storing personal data is regulated by law (at least in the EU), so it might even be problematic for some users.

oneingan commented 3 weeks ago

In my opinion multiple OIDC provider will be a must in the short time, considering how zero-secrets workload identification is evolving, for example with GitHub Actions OIDC provider.

micolous commented 3 weeks ago

Please let the administrator decide what they want to use as the user identifier. Now I mapped the username (centrally managed and so unique within the organisation) to sub (in Keycloak), so I can use normal user identification in the ACLs (just to find out that it still uses email, but I assume that it’s not the final design).

@jirutka Your installation now makes it impossible for username and email changes to take effect – exactly the issue we're trying to fix.

Name changes happen, and failure to handle that properly is an ethical problem.

As an anecdote: I know of a very large technology company where username changes were possible, but caused you to be locked out of all systems for 1 - 2 weeks while the change propagated to all systems, often through manual intervention. People would plan to change teams (which would result in ACL reassignments anyway) or take vacation when they wanted to change username.

This was because many of the company's systems primarily worked with usernames, email addresses or UPNs, rather than a sub-like identifier.

It would be even better to make the claim used as the user identifier configurable (for admins who don’t control their IdP). It’s the admin’s responsibility to choose an appropriate identifier (which depends on the organisation’s policy) and make sure that it’s unique and immutable.

With respect: administrators consistently shoot themselves in the foot on this one. Mapping username to sub is a great example of that.

The OIDC specification is very clear about what is and isn't a stable identifier. It also states what a sub should look like:

REQUIRED. Subject Identifier. A locally unique and never reassigned identifier within the Issuer for the End-User, which is intended to be consumed by the Client, e.g., 24400320 or AItOawmwtWwcT0k51BayewNvutrJUqsvl6qs7A4. It MUST NOT exceed 255 ASCII [RFC20] characters in length. The sub value is a case-sensitive string.

If your sub looks like jirutka or micolous, then you're doing it wrong.

Instead, lets focus on the goal: to be able to assign policies based on a familiar identifier.

I already proposed a mechanism by which an administrator can continue to reference a user by a familiar-yet unstable identifier, and have it automatically converted into a policy which uses a stable identifier.

That will require some more work to make it happen, which won't land in this PR.

Headscale's previous use of the email claim mimicked Tailscale's design. Unfortunately, Tailscale's design does not appear to consider this issue.

It will take many steps to fully unravel and address this design problem in a safe and secure way. 😄

micolous commented 3 weeks ago

In my opinion multiple OIDC provider will be a must in the short time, considering how zero-secrets workload identification is evolving, for example with GitHub Actions OIDC provider.

While multiple OIDC providers probably won't be implemented in the short term, I agree that it's important to consider this in the present design, so that we don't need another migration process to unlock that functionality. 😄

kradalby commented 1 week ago

With User.Name no longer being a unique constraint (which is good!), there are a number of other issues this opens up:

I believe all of these have been addressed now. They will still resolve from username to an ID, but fail if there is more than one user.

I have filed https://github.com/juanfont/headscale/issues/2246 as a followup to add ID flags to all CLI commands as this PR is already big enough.