gravitational / teleport

The easiest, and most secure way to access and protect all of your infrastructure.
https://goteleport.com
GNU Affero General Public License v3.0
17.57k stars 1.76k forks source link

Provide a way for Machine ID to issue identities based on access requests #46449

Open webvictim opened 1 month ago

webvictim commented 1 month ago

What would you like Teleport to do?'

Take a hypothetical flow where a Teleport user requests access to a set of machines to run a deployment. They create an access request in Teleport which is then subsequently approved, then the user is able to assume the request and run SSH commands themselves against these machines.

This is fine if the user can run the deployment commands themselves, but what if they need to run the deployment via a CI/CD pipeline (run using Jenkins in this example)?

The user can request access to Jenkins through Teleport and log in using JWT or SAML auth which could conditionally grant them access to the correct pipeline based on the existence of a certain Teleport role, but the identity used by Jenkins (and the underlying deployment platform) to actually execute the SSH commands would still have to be a generic Machine ID identity which wasn't subject to the same approval process as the user.

As such, it has no context about:

Making tbot able to run with an access request ID (preferably the same ID that the user had already been approved for) would keep audit trails associated with Access Requests more consistent throughout Teleport, while still ensuring least privileged access with short-lived certificates.

This is effectively allowing Machine ID to "impersonate" a user's access request, or perhaps request approval for the same set of permissions as a given access request, but on its own behalf.

What problem does this solve?

Read the section above.

If a workaround exists, please include it.

The only real workaround currently is to issue a longer-lived identity and put it in some kind of vault or only allow deployments with a given Teleport role, then make users request access to the vault or role before running deployments. This is very much a legacy flow and we could definitely build something way better.

strideynet commented 1 month ago

Super interesting request.

I think there's a number of reasons that having Machine ID impersonate a user's access request may be challenging. It'd involve some fairly hefty changes to RBAC, and we'd want to provide some way to restrict which access requests it could impersonate. I also think the UX is not fantastic as the user would need to "input" the access request ID into the CI flow on each run.

However, if we flip this around, and instead have the Machine ID bot create the access request, it's a less complex build and I think the UX is improved. What I imagine is:

It's a little loose of an overview, but to me, that makes a lot of sense and I think provides a better UX. I'm curious if this flipped flow has any flaws from your point of view.

Within the scope of a one-shot tbot run, this is fairly simple. Handling this within a long-lived daemon is a little more challenging, and could lead to a lot of access requests being created. We'd also need to figure out if it would be necessary for a single access request to be used for multiple outputs, or if it would be fine for these to be scoped to a single output.

We'd want to make sure the assumed access request was included in the audit log entries when it was used to do things - this'll let the action be tied back to a specific users approval of the CI run.

webvictim commented 1 month ago

I think the flow described is pretty logical and would work very well in situations where Machine ID access should always be approved before use - basically a bot which only has one job, but you want a human approval tied to that job before it can run.

Thinking back to how this could ultimately help solve the original scenario I described, my main question is whether we could/would we make the content of access requests dynamic?

For resource access requests to really be useful in this scenario, the Machine ID bot running on Jenkins would need to make a request with specific node/application/database UUIDs which are likely to change on each execution - this couldn't be set in its configuration ahead of time. Instead, the set of resources/roles for the bot should perhaps be able to be configured in the web UI by a user, who could then have the bot raise the access request and approve it themselves - almost more like approving a headless command execution.

Maybe this starts to head more into the concept of "permission sets" or some kind of pre-approved command infrastructure for specific hosts (as discussed in RFD 132: https://github.com/gravitational/teleport/blob/jakule/command-rfd/rfd/xxxx-command.md)

I actually think the UX for a user inputting their own access request ID into CI/CD to associate it with each run is less cumbersome than the flow I just described above; basically think of a user looking for Machine ID to run commands on their behalf as part of a pipeline with the exact same set of access that they were already approved for. The flow you described in your comment would mean that they have to pick the resources on behalf of the bot anyway, then find a way to communicate exactly what resources the bot should make its request for - whereas their own access request ID already neatly encapsulates all of that information, they just need a way to delegate the permissions to a runner other than themselves.

strideynet commented 1 month ago

The primary use-case in mind here is approval of sensitive CI runs that use Machine ID to access resources protected by Teleport. Our hypothetical user organization has a change management and approval process, with a number of "required" approvers before CI runs that introduce infrastructure changes can be completed. Our hypothetical organization uses a mixture of CI/CD platforms, and whilst some of these have a native mechanism for multi-user approval of CI runs, some of them do not.

Our primary goals should be:

Suggested Design

Flow:

  1. A change has been planned, and, a human user creates an access request on behalf of a Machine ID bot
    1. An access request created on behalf of a user is only assumable by the user who it has been created for, not by the user who created it.
    2. Teleport's RBAC is extended to add a new mechanism that controls which users you can create an access request on behalf of.
    3. When creating an access request on behalf of a user, the access request can only include roles/resources that the user who will assume the request can make requests for (e.g the access privileges of the subject are used, rather than the access privileges of the access request creator)
  2. This access request is reviewed by the CAB, who then approve the access request within Teleport
  3. The user triggers the CI flow, providing the ID of the approved access request.
  4. The tbot agent within the CI flow assumes this access request when generating certificates and within the generated certificates themselves.

Engineering changes required:

Concerns:

Questions:

There's an interesting proposition in #5081 that a user should be able to cancel their own access requests. This could be leveraged to allow the Bot to "destroy" the access request after the CI has completed successfully.

We probably ought to create an RFD to firm out this design before proceeding with implementation due to potential security impacts.

Alternatives