Provide a way for Machine ID to issue identities based on access requests

webvictim commented 1 month ago

What would you like Teleport to do?'

Take a hypothetical flow where a Teleport user requests access to a set of machines to run a deployment. They create an access request in Teleport which is then subsequently approved, then the user is able to assume the request and run SSH commands themselves against these machines.

This is fine if the user can run the deployment commands themselves, but what if they need to run the deployment via a CI/CD pipeline (run using Jenkins in this example)?

The user can request access to Jenkins through Teleport and log in using JWT or SAML auth which could conditionally grant them access to the correct pipeline based on the existence of a certain Teleport role, but the identity used by Jenkins (and the underlying deployment platform) to actually execute the SSH commands would still have to be a generic Machine ID identity which wasn't subject to the same approval process as the user.

As such, it has no context about:

who the underlying user is that made the access request
what machines/resources the user was approved to access
how long the access request is valid for

Making tbot able to run with an access request ID (preferably the same ID that the user had already been approved for) would keep audit trails associated with Access Requests more consistent throughout Teleport, while still ensuring least privileged access with short-lived certificates.

This is effectively allowing Machine ID to "impersonate" a user's access request, or perhaps request approval for the same set of permissions as a given access request, but on its own behalf.

What problem does this solve?

Read the section above.

If a workaround exists, please include it.

The only real workaround currently is to issue a longer-lived identity and put it in some kind of vault or only allow deployments with a given Teleport role, then make users request access to the vault or role before running deployments. This is very much a legacy flow and we could definitely build something way better.

strideynet commented 1 month ago

Super interesting request.

I think there's a number of reasons that having Machine ID impersonate a user's access request may be challenging. It'd involve some fairly hefty changes to RBAC, and we'd want to provide some way to restrict which access requests it could impersonate. I also think the UX is not fantastic as the user would need to "input" the access request ID into the CI flow on each run.

However, if we flip this around, and instead have the Machine ID bot create the access request, it's a less complex build and I think the UX is improved. What I imagine is:

Configure the Bot with a role that allows it to make access requests.
Configure tbot with the access request details it should make.
On start, tbot will create the access request and output the link. It will then poll for the access request to be approved.
User follows the link and approves the access request.
tbot detects approval and begins to run it's main task.

It's a little loose of an overview, but to me, that makes a lot of sense and I think provides a better UX. I'm curious if this flipped flow has any flaws from your point of view.

Within the scope of a one-shot tbot run, this is fairly simple. Handling this within a long-lived daemon is a little more challenging, and could lead to a lot of access requests being created. We'd also need to figure out if it would be necessary for a single access request to be used for multiple outputs, or if it would be fine for these to be scoped to a single output.

We'd want to make sure the assumed access request was included in the audit log entries when it was used to do things - this'll let the action be tied back to a specific users approval of the CI run.

webvictim commented 1 month ago

I think the flow described is pretty logical and would work very well in situations where Machine ID access should always be approved before use - basically a bot which only has one job, but you want a human approval tied to that job before it can run.

Thinking back to how this could ultimately help solve the original scenario I described, my main question is whether we could/would we make the content of access requests dynamic?

For resource access requests to really be useful in this scenario, the Machine ID bot running on Jenkins would need to make a request with specific node/application/database UUIDs which are likely to change on each execution - this couldn't be set in its configuration ahead of time. Instead, the set of resources/roles for the bot should perhaps be able to be configured in the web UI by a user, who could then have the bot raise the access request and approve it themselves - almost more like approving a headless command execution.

Maybe this starts to head more into the concept of "permission sets" or some kind of pre-approved command infrastructure for specific hosts (as discussed in RFD 132: https://github.com/gravitational/teleport/blob/jakule/command-rfd/rfd/xxxx-command.md)

I actually think the UX for a user inputting their own access request ID into CI/CD to associate it with each run is less cumbersome than the flow I just described above; basically think of a user looking for Machine ID to run commands on their behalf as part of a pipeline with the exact same set of access that they were already approved for. The flow you described in your comment would mean that they have to pick the resources on behalf of the bot anyway, then find a way to communicate exactly what resources the bot should make its request for - whereas their own access request ID already neatly encapsulates all of that information, they just need a way to delegate the permissions to a runner other than themselves.

strideynet commented 1 month ago

The primary use-case in mind here is approval of sensitive CI runs that use Machine ID to access resources protected by Teleport. Our hypothetical user organization has a change management and approval process, with a number of "required" approvers before CI runs that introduce infrastructure changes can be completed. Our hypothetical organization uses a mixture of CI/CD platforms, and whilst some of these have a native mechanism for multi-user approval of CI runs, some of them do not.

Our primary goals should be:

Avoid introducing new privilege escalation concerns
Avoid introducing a complex UX to Machine ID and Teleport for interacting with this
Where possible, ensure that changes to access requests here could also benefit humans.

Suggested Design

Flow:

A change has been planned, and, a human user creates an access request on behalf of a Machine ID bot
1. An access request created on behalf of a user is only assumable by the user who it has been created for, not by the user who created it.
2. Teleport's RBAC is extended to add a new mechanism that controls which users you can create an access request on behalf of.
3. When creating an access request on behalf of a user, the access request can only include roles/resources that the user who will assume the request can make requests for (e.g the access privileges of the subject are used, rather than the access privileges of the access request creator)
This access request is reviewed by the CAB, who then approve the access request within Teleport
The user triggers the CI flow, providing the ID of the approved access request.
The tbot agent within the CI flow assumes this access request when generating certificates and within the generated certificates themselves.

Engineering changes required:

RBAC support to allow users to create access requests on behalf of other users
Support within tbot to assume an approved access request when generating credentials
Support within the CLI and/or Web UI to allow users to create access requests on behalf of other users

Concerns:

Providing the access request ID to tbot for it to assume may be a less than smooth user experience.
It's a little difficult to limit the access request to only being used to enact the change as approved. Therefore this is more of a change management process enforcement than necessarily security control.
- Example: Access Request is approved by a CAB on the basis of proposed changes within X branch of the repository. The approved access request could then be used to within a CI run on the Y branch. This risk is somewhat curtailed by platforms that support delegated joining as the Bot's ability to join could be restricted to a more limited set of branches.
- This can be worked around by creating a Bot/Join Token for a specific CI run, and then creating an access request on behalf of this Bot.
- This could potentially be improved in future with a new feature that allows an access request to only be assumable whereby a user/bot's current login/join fulfils some traits - e.g you could limit a specific access request to only be assumable by a bot instance which joined using GitLab and against X commit. This could also apply to humans (e.g limit assuming the access request to cases where the user is using a specific device with device trust). This would be a significantly complex build.

Questions:

What is the best way to specify what subjects a role allows you to create access requests on behalf of?
- Labels on the subject?
- Individually naming the subject?
Do we need to limit what kind of access requests a user can create on behalf of another user to a subset of what that subject can usually request?
Does using the access request privileges of the access request subject make more sense than using the access request privileges of the creator?

There's an interesting proposition in #5081 that a user should be able to cancel their own access requests. This could be leveraged to allow the Bot to "destroy" the access request after the CI has completed successfully.

We probably ought to create an RFD to firm out this design before proceeding with implementation due to potential security impacts.

Alternatives

Have the bot create the access request itself at runtime
- This doesn't fit too well since part of this use-case is the ability to approve the CI run ahead of time as part of a CAB
- This isn't necessarily precluded by the proposed design - some elements of the engineering work could be reused if we decided to introduce this at a later date
Allow the bot to assume the access requests of other users
- This seems more concerning from a privilege escalation angle - but - does eliminate the need to design new CLI/UI around creating access requests on behalf of other users. My main concern would be around controlling which of a users access requests a bot could assume - not all of your access requests may be related to the CI use-case.
Build a "bot join approval" mechanism instead
- This is a much bigger build, but, we could design a system whereby the creation of a Bot and a single-use join token could be undergo a similar approval mechanism as that we created for access requests.
- Requires very few changes to tbot itself, and requires no changes to the existing access request mechanism and its security invariants - however - requires a significantly complex new feature build.
- Elegantly solves problems around the "wrong" CI run being able to assume an access request or multiple CI runs being able to assume an access request.

gravitational / teleport