[proposal] Refactor identity provider object

oxidecomputer / omicron

Omicron: Oxide control plane

Mozilla Public License 2.0

252 stars 40 forks source link

[proposal] Refactor identity provider object #6663

Open augustuswm opened 1 month ago

augustuswm commented 1 month ago

This is a proposal on a breaking change to the way we represent identity providers to unblock IdP edits. I haven't thought through how this may be implemented in a backwards compatible way, but I wanted to open this to start some discussion.

Currently we store SAML identity provider connections as a single row which merges two configurations (service provider and identity provider config):

name
idp_metadata_document_string
idp_entity_id
sp_client_id
acs_url
slo_url
technical_contact_email
public_cert
private_key
group_attribute_name

This makes editing and updating IdP configuration bring in all of the concerns of what happens when we change identity sources (i.e. do we need an additional id along with external_id, do we perform merging, etc).

We can likely avoid these issues, while gaining a lot of the update benefits we want by splitting this in to two tables / objects, one that represents an identity provider and one that represents a service provider. This models the data closer to reality where the Silo is the service provider, and we then may have multiple IdPs that users could authenticate through (future state).

It would look something like:

Service Provider

name
sp_client_id
acs_url
slo_url
technical_contact_email
public_cert
private_key

Identity Provider

name <-- Not related to service provider name
idp_metadata_document_string
idp_entity_id
group_attribute_name

By splitting it this way we could support key / certificate rotations for signed requests, as well as changing sp_client_id and and technical_contact_email.

This also could help transition to a model where an Identity Provider can be deleted or edited (as long as there are no linked users) without touching the service provider configuration.

davepacheco commented 1 month ago

Thanks for opening this. What's the problem you're trying to solve here? Is it just to be able to replace keys, certificates, client id, and technical contact mail? @jmpesp can probably correct if I'm misremembering but I believe the intent here was that you'd create a new IdentityProvider object to do this and then remove the first one. Does that not work here?

augustuswm commented 1 month ago

This is another iteration on the https://github.com/oxidecomputer/omicron/issues/3125 problem of being able to mutate the identity provider configuration of a silo. There are two goals of this change:

Allow mutation of fields that can be mutated without removing an identity provider (and as a result needing to resolve the issues raised in #3125 .
Allow re-use of service providers across multiple IdPs.

The second point is not a current requirement, but drives forward the ability to have multiple identity provider configurations sharing a single service provider configuration.

The fields that would end up updateable in this case would be:

sp_client_id
technical_contact_email
public_cert
private_key

The public_cert, private_key, and technical_contact_email I think are pretty self explanatory for needing updates, and changing them does not modify where/how we are getting external identities from.

The sp_client_id similarly does not modify where/how we are getting external identities from, and is only needed to be updateable because we allow users to specify the arbitrary values for it. In general it should be rare for this value to be updated outside of initial configuration. (This value is often precomputed by a service provider, and not modifiable, but we allow user supplied values).

In contrast, modifying of idp_metadata_document_string or idp_entity_id can in effect be changing the identities that we are sent without a clear way to reconcile them with identities we already have learned of. As a result these would not be mutateable and require replacement of the IdP configutation.

davepacheco commented 1 month ago

I definitely want to get @jmpesp's input on this but I thought the existing model was intended to handle changing these fields. I see there's an API to create a new SAML IdP, though none to delete an old one. Does it work to rotate keys or client id by adding a new IdP with the new key and cert?

Or is the problem that you can update these fields and it all works fine to do key rotation, etc. but you're just worried that a user error will result in the problems mentioned in #3125?

I'm not sure it's necessary to separate these into two objects to be able to make only some of the fields modifiable. I'm not sure it's bad either but I suspect it's a bunch more work than other possible approaches. (Also, can you say more about "Allow re-use of service providers across multiple IdPs"? Are you talking about having multiple Silos show up as a single service provider in the IdP?)

augustuswm commented 1 month ago

Yeah, let me rephrase a bit. I don't think we necessarily need to separate this in to two objects. I think it may be a clearer abstraction than what we have today, but I agree it is a lot more work and may not be worth the break. As for the endpoints, if we added update and/or delete then they would definitely be sufficient for key rotation, but I do think we would need to solve the identity issues in #3125 first. Especially since we currently only store an external id, and not which IdP configuration that external id came from.

Alternatively we could implement an update endpoint that only supported the fields that are "safe" to update (i.e. the service provider fields).

The idea behind splitting them (and re-use) is to allow the same service provider (i.e. a single silo) to have multiple IdPs connections. This might look like multiple realms within Keycloak, or a case that we would use where we would support KeyCloak authentication for external users to a demo silo, and Google Workspace authentication for support staff. These are two separate IdP configurations, but they would have the same service provider. In our current state this would look require entering the service provider details (which may be the same) for each identity provider. Actual customers are not asking for this though, so it is a far second behind smoothing out the silo / idp creation and update flows. There are also other issues that would need to be solved before this was feasible as well.

For reference, I don't think we need to (nor should we) try to do the other direction of multiple silos under a single IdP config. Handling of this is better left to the IdP.

davepacheco commented 1 month ago

Got it. Maybe the thing to do is set up a time to chat with @jmpesp, @askfongjojo and whoever else and get to ground on which of these problems we want to bite off and how best to go about it?

The idea behind splitting them (and re-use) is to allow the same service provider (i.e. a single silo) to have multiple IdPs connections. This might look like multiple realms within Keycloak, or a case that we would use where we would support KeyCloak authentication for external users to a demo silo, and Google Workspace authentication for support staff.

If I'm understanding right, this is something we explicitly decided to try to avoid if we could get away with it. RFD 234 wrote:

A previous version of this RFD described in some depth the question of whether we should allow the rack to be configured with multiple IdPs. The problem is essentially that it’s possible (and sometimes easy) to wind up in bad situations: if a person has an identity in two different IdPs, how do we know they’re the same? Do we provide complex rules for letting administrators map users? Isn’t that the job of an aggregating IdP? What if someone gets it wrong — the admin forgets to set this up and a person logs in with a different identity than they did last time and doesn’t see any of their resources? Or worse, they go create a bunch of resources and then want to merge the accounts? Even if we want to disallow this, what if an operator accidentally changes the IdP to a different one? All of this is made a lot simpler with the constraints proposed here that every Silo is associated with one logical IdP and that cannot be changed. (They can have a different endpoint associated with them for migration or certificate rotation, but we always treat those as the same realm.) At the same time, we can support multiple IdPs in the rack — as long as they’re in different Silos.

This assumed that if someone wanted to do what you said (multiple Keycloak realms or Keycloak + Google), then they could (and might prefer to) instead do that with their IdP. Is that a credible option for what you're looking to do?

This is of course not set in stone, but there's definitely some complexity here so all things being equal it'd be nice to avoid revisiting these assumptions.

augustuswm commented 1 month ago

Sounds good, and while this is a pain point (updates specifically), I don't think it rises above our current high priority work. I appreciate all of thoughts on this!

Thanks for noting that section from RFD 234, I had missed that that was previously discussed. I misunderstood section 1.2 as allowing multiple configurations (though maybe not at the same time on another reading). It certainly adds a lot of complexity to allow multiple configurations, and I agree we don't want to explicitly support it until there is external requirements.

I think though if we want to allow deletion (or recovery from being locked out), we would need similar tracking to what we would need for a multi-idp setup. Definitely makes sense to chat and figure out how we might smooth out the edges without adding to much complexity or locking in to extra requirements.