oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
250 stars 39 forks source link

Staying in sync with user access and permission changes in external identity provider #2587

Open askfongjojo opened 1 year ago

askfongjojo commented 1 year ago

There are two scenarios I haven't figured out the workflow behind the scenes (note: this may not be a software issue but rather something to understand for documentation purpose):

1) Revoking API access after the user has been disabled in the IdP Since the user can no longer authenticate to the rack, it would seem that there is nothing to trigger the expiration of the person's device tokens. The only way I can think of is attempting to perform IdP user import whenever the user makes an API call - which seems like a prohibitively expensive operation.

2) Sync group membership info in Oxide rack without depending on user logging into console Similar to the above scenario, if the user import event happens only during console login, their group membership information in the rack may be stale. If the user is working mostly with API and rarely uses the console, how would their project access permissions be kept up to date if the IAM roles are configured with groups?

cc/ @davepacheco (This may be related to #2302 but this is a more specific question about IdP sync.)

davepacheco commented 1 year ago

These are good questions and you're right we should probably document the behavior somewhere.

  1. Revoking API access after the user has been disabled in the IdP Since the user can no longer authenticate to the rack, it would seem that there is nothing to trigger the expiration of the person's device tokens. The only way I can think of is attempting to perform IdP user import whenever the user makes an API call - which seems like a prohibitively expensive operation.

I believe that's right. As you mentioned, #2302 would eventually expire the tokens, but not necessarily right away. By the way, I think the same problem exists with web console sessions today.

I think the best answer here is to implement support for the SAML logout flow (RFD 234 S5.4), which would allow the IdP to reach out to the Oxide system to terminate all of the user's authenticated sessions. I assume (but we should verify) that an IdP would issue that request when deactivating a user in the IdP. I think the logout request from the IdP should invalidate all of a user's web console session tokens as well as their device authn tokens.

This last part might be controversial? If you think of device authn tokens like API keys, then they shouldn't be invalidated by a logout. But I don't think that's really what they are -- I think they're tokens to authenticate an interactive session, just from a non-browser device. If we want to do proper, long-lived API keys, we can -- but then this IdP synchronization problem would become more important. A better answer for this is probably service accounts, which only ever live in the Oxide system (not the IdP). I do think that'll be an important post-MVP feature.

  1. Sync group membership info in Oxide rack without depending on user logging into console Similar to the above scenario, if the user import event happens only during console login, their group membership information in the rack may be stale. If the user is working mostly with API and rarely uses the console, how would their project access permissions be kept up to date if the IAM roles are configured with groups?

Conceivably, once we support SAML logout, an IdP could send a SAML logout request to the Oxide system when a user's groups change. That would force the user to re-authenticate and we'd learn the new group information. I've seen this sort of behavior in other systems but I don't know if this is a thing that people do with SAML.

Aside from that, I'm not sure it's possible for us to do a "user import" whenever we feel like it, even if we wanted to. I did not think SAML supported a flow where the Oxide system could reach back out to the IdP proactively to update our information about a user, nor a flow where the IdP could reach out to us to tell us that a user's groups have changed.

Again, this problem applies to established console sessions, too.

For post-MVP: there are ways of doing this that don't involve SAML. Some of these are described in the second two bullet points in RFD 234 Section 4.2. There appear to be at least two different ways to do this with SCIM: either we run SCIM and the IdP sends us these kind of updates, or the IdP runs SCIM and we can query it whenever we want. An LDAP-based integration could allow us to query the IdP whenever we want. It's also possible we could integrate with IdP-specific APIs for this sort of thing. All of the options appear to be a bunch of work and highly specific to a customer's deployment. Even SCIM, which is a standard, means different things when it's "SCIM for Okta" vs. "SCIM for PingFederate". So the assumption is we'll want to listen to early customers, see what they want, and be strategic about where we invest our efforts.

askfongjojo commented 1 year ago

@davepacheco - Thanks for weighing in. I'll mark this ticket for post-MVP since it's a security-related concern. In the event that the implementation of the IdP-driven expiration does not materialize in the MVP+1 timeframe, perhaps one other option to consider is making available an operator API to disable user accounts and include user status checks in IAM. This way the user may still have an active session but won't be able to make any CRUD requests.