juanfont / headscale

An open source, self-hosted implementation of the Tailscale control server
BSD 3-Clause "New" or "Revised" License
23.65k stars 1.3k forks source link

Resolve user to stable unique ID in policy #2205

Closed kradalby closed 2 days ago

kradalby commented 1 month ago

currently, the policy approach node to user matching with a quite naive approach looking at the username provided in the policy and matched it with the username on the nodes. This worked ok as long as usernames were unique and did not change.

As usernames are no longer guarenteed to be unique in an OIDC environment we cant rely on this.

This changes the mechanism that matches the user string (now user token) with nodes:

If more than one user is matching, then the query is rejected, and zero matching nodes are returned.

When a single user is found, the node is matched against the User database ID, which are also present on the actual node.

This means that from this commit, users can use the following to identify users in the policy:

There are more changes coming to this, so it is not recommended to start using any of these new abilities, with the exception of email, which will not change since it includes an @.

kradalby commented 1 month ago

@micolous would be great if you can take a look at this.

micolous commented 1 month ago

If I'm reading this correctly, it looks like this is always doing full table scans in the app (FullMapResponse() calls users, err := m.db.ListUsers()). This can't take advantage of database indexes, and means that every ACL evaluation uses a lot of memory.

These matches could be checked on an SQL / ORM level.

I'd make sure there's a way to explicitly declare which of these types you're matching (eg: uid:1234, oidc:941B1E7F-9174-4E8C-AA73-3FF3833D1B81/idp.example.com). I understand there's a need to migrate existing ACLs here as well – so we couldn't require that just yet. πŸ˜„

With implicit-only matching, and a full-table scan, it looks like filterNodesByUser would try to match an ACL X against all users as:

Even though this would prioritise matching by User ID or OIDC ID, if the database returned another User record with a matching email address or username field before it found one with a matching User ID or OIDC ID, it would do the same "there's two of them" and fail to match at all.

If the ListUsers query has a non-deterministic order, then you could end up with it "sometimes" failing, depending on whether the user with a matching User ID or OIDC ID came first.

While it's good that having > 1 match fails safe for usernames and emails, that could be used as a denial of service vector by a malicious user when the policy is intended to match a different field. It could also allow for deceptive policy changes, where a policy intending to match by one field actually matches against a different one.

Having an way to explicitly choose which field to match would fix that issue for User ID and OIDC ID matches - because we can guarantee these are unique and unchanging.

That said, any username-based (or potentially, email-based) policy is always to vulnerable to this sort of issue. The only way to really mitigate that is for Headscale to resolve all unstable identifiers into stable identifiers at policy submission time – then the denial of service vector only comes when an administrator tries to update a policy. That may be something to think about later, though.

kradalby commented 1 month ago

If I'm reading this correctly, it looks like this is always doing full table scans in the app (FullMapResponse() calls users, err := m.db.ListUsers()). This can't take advantage of database indexes, and means that every ACL evaluation uses a lot of memory.

These matches could be checked on an SQL / ORM level.

This will be changed as the policy code will be reimplemented, currently I will not spend time on improving the performance of this as it will be tossed.

I'd make sure there's a way to explicitly declare which of these types you're matching (eg: uid:1234, oidc:941B1E7F-9174-4E8C-AA73-3FF3833D1B81/idp.example.com). I understand there's a need to migrate existing ACLs here as well – so we couldn't require that just yet. πŸ˜„

Ultimately, I do not think we will accept all of these in the policy identifiers, I do not think infinite configurability is the way to go here.

Currently I am leaning towards username@ and email@domain.net (if available). An @ will be required in the identifier. Prefixes add more complexity, and it is hard enough already.

With implicit-only matching, and a full-table scan, it looks like filterNodesByUser would try to match an ACL X against all users as:

  • Checking user ID = X, adding any match to a list, and not looking for more matches if it got one
  • Checking OIDC ID = X, adding any match to a list, and not looking for more matches if it got one
  • Checking email address = X, then adding any match to a list, then continuing to look for more matches of any type
  • Checking username = X, then adding any match to a list, then continuing to look for more matches of any type

Even though this would prioritise matching by User ID or OIDC ID, if the database returned another User record with a matching email address or username field before it found one with a matching User ID or OIDC ID, it would do the same "there's two of them" and fail to match at all.

If the ListUsers query has a non-deterministic order, then you could end up with it "sometimes" failing, depending on whether the user with a matching User ID or OIDC ID came first.

I've added more tests, I do not think this is the case and that it currently works as intended πŸ€” You only pass it one "token" and that can only be resolved to one type.

While it's good that having > 1 match fails safe for usernames and emails, that could be used as a denial of service vector by a malicious user when the policy is intended to match a different field. It could also allow for deceptive policy changes, where a policy intending to match by one field actually matches against a different one.

I am not sure if I understand this, but I think it will be solved/reduced by reducing the amount of allowed fields.

Having an way to explicitly choose which field to match would fix that issue for User ID and OIDC ID matches - because we can guarantee these are unique and unchanging.

I've removed User ID as it does not really provide value, it is too much flexibility.

That said, any username-based (or potentially, email-based) policy is always to vulnerable to this sort of issue. The only way to really mitigate that is for Headscale to resolve all unstable identifiers into stable identifiers at policy submission time – then the denial of service vector only comes when an administrator tries to update a policy. That may be something to think about later, though.

This is planned to be resolved when redoing the policy package. The new version is planned to resolve everything at load-time instead of runtime, I've started this work, but it is dependent on other work here.

micolous commented 2 weeks ago

I'd make sure there's a way to explicitly declare which of these types you're matching (eg: uid:1234, oidc:941B1E7F-9174-4E8C-AA73-3FF3833D1B81/idp.example.com). I understand there's a need to migrate existing ACLs here as well – so we couldn't require that just yet. πŸ˜„

Ultimately, I do not think we will accept all of these in the policy identifiers, I do not think infinite configurability is the way to go here.

Currently I am leaning towards username@ and email@domain.net (if available). An @ will be required in the identifier. Prefixes add more complexity, and it is hard enough already.

The reason for allowing explicit prefixes everywhere is to ensure a policy condition always evaluates against a specific type or attribute. When (a/certain) type(s)/attribute(s) can only be tested with an unprefixed value (eg: with usernames in ACLPolicy.ExpandAlias, acls.expandOwnersFromTag and acls.expandUsersFromGroup), then it limits what you can set those values to on the other side.

This could cause some unexpected evaluations, particularly if someone decided to set their username to something like group:foo or a value which could be interpreted differently (eg: an IP address).

On the OIDC side, you should assume that the preferred_username and email fields are under user control at all times. While you can reasonably expect the email claim to always contain a validated email address (if you trust the IdP's email validation process, and there's an email_verified claim), there may still be room for shenanigans with that field.

As a side note, there few restrictions on which characters may appear in the sub claim, beyond:

It MUST NOT exceed 255 ASCII [RFC20] characters in length. The sub value is a case-sensitive string.

This would allow @ to appear in the sub claim. As an example, Shibboleth makes the sub claim administrator defined, and it looks like some administrators put email addresses in there, despite that being an extremely bad idea.

I've added more tests, I do not think this is the case and that it currently works as intended πŸ€” You only pass it one "token" and that can only be resolved to one type.

You'll need a test where there's a username that looks like a numeric user ID (eg: 1) or something that could appear in the OIDC provider identifier field (eg: http://oidc.org/1234 in your tests) to trigger this issue.

While it's good that having > 1 match fails safe for usernames and emails, that could be used as a denial of service vector by a malicious user when the policy is intended to match a different field. It could also allow for deceptive policy changes, where a policy intending to match by one field actually matches against a different one.

I am not sure if I understand this, but I think it will be solved/reduced by reducing the amount of allowed fields.

Consider a policy that applies to https://idp.example.com/A7D79910-3F42-4CB7-BA2D-F47B46DFF699, intending to match an OIDC provider identifier for a genuine user.

If a malicious user changes their preferred_username to https://idp.example.com/A7D79910-3F42-4CB7-BA2D-F47B46DFF699, it would mean that there are now two users who could match that policy, and it may not be applied to the genuine user any more (if the database returns the malicious user's record first).

If the genuine user is deleted, but a policy still references the user by OIDC provider identifier, a malicious user could change their preferred_username to the OIDC provider identifier of the deleted user, and old policies would apply to them.

If there's explicit prefixes for matching on each of those attributes, an administrator can be assured that a rule can only evaluate against the type and attribute that they expect.

Having an way to explicitly choose which field to match would fix that issue for User ID and OIDC ID matches - because we can guarantee these are unique and unchanging.

I've removed User ID as it does not really provide value, it is too much flexibility.

At the moment, it doesn't look like users.RenameUser updates existing ACL references, so User ID is still useful for non-OIDC installations, because then you have a stable and unique user identifier, which persists even if the user's username or email address changes.

This means that you wouldn't need to worry about Username being unique anymore.

This is planned to be resolved when redoing the policy package. The new version is planned to resolve everything at load-time instead of runtime, I've started this work, but it is dependent on other work here.

No worries πŸ˜„

kradalby commented 1 week ago

We will not do any prefixing of the types.

Consider a policy that applies to https://idp.example.com/A7D79910-3F42-4CB7-BA2D-F47B46DFF699, intending to match an OIDC provider identifier for a genuine user.

Fixed the case where the iss+sub is still in the database.

If a malicious user changes their preferred_username to https://idp.example.com/A7D79910-3F42-4CB7-BA2D-F47B46DFF699, it would mean that there are now two users who could match that policy, and it may not be applied to the genuine user any more (if the database returns the malicious user's record first).

We will solve this by using the same requirements for usernames as we do for the CLI, that way it cannot look like the provider identity. Same for email. If not valid, they will be omitted.

At the moment, it doesn't look like users.RenameUser updates existing ACL references, so User ID is still useful for non-OIDC installations, because then you have a stable and unique user identifier, which persists even if the user's username or email address changes.

The administrator is responsible for maintaining this.

kradalby commented 1 week ago

@micolous PTAL