Open webvictim opened 1 month ago
Super interesting request.
I think there's a number of reasons that having Machine ID impersonate a user's access request may be challenging. It'd involve some fairly hefty changes to RBAC, and we'd want to provide some way to restrict which access requests it could impersonate. I also think the UX is not fantastic as the user would need to "input" the access request ID into the CI flow on each run.
However, if we flip this around, and instead have the Machine ID bot create the access request, it's a less complex build and I think the UX is improved. What I imagine is:
tbot
with the access request details it should make.tbot
will create the access request and output the link. It will then poll for the access request to be approved.tbot
detects approval and begins to run it's main task.It's a little loose of an overview, but to me, that makes a lot of sense and I think provides a better UX. I'm curious if this flipped flow has any flaws from your point of view.
Within the scope of a one-shot tbot
run, this is fairly simple. Handling this within a long-lived daemon is a little more challenging, and could lead to a lot of access requests being created. We'd also need to figure out if it would be necessary for a single access request to be used for multiple outputs, or if it would be fine for these to be scoped to a single output.
We'd want to make sure the assumed access request was included in the audit log entries when it was used to do things - this'll let the action be tied back to a specific users approval of the CI run.
I think the flow described is pretty logical and would work very well in situations where Machine ID access should always be approved before use - basically a bot which only has one job, but you want a human approval tied to that job before it can run.
Thinking back to how this could ultimately help solve the original scenario I described, my main question is whether we could/would we make the content of access requests dynamic?
For resource access requests to really be useful in this scenario, the Machine ID bot running on Jenkins would need to make a request with specific node/application/database UUIDs which are likely to change on each execution - this couldn't be set in its configuration ahead of time. Instead, the set of resources/roles for the bot should perhaps be able to be configured in the web UI by a user, who could then have the bot raise the access request and approve it themselves - almost more like approving a headless command execution.
Maybe this starts to head more into the concept of "permission sets" or some kind of pre-approved command infrastructure for specific hosts (as discussed in RFD 132: https://github.com/gravitational/teleport/blob/jakule/command-rfd/rfd/xxxx-command.md)
I actually think the UX for a user inputting their own access request ID into CI/CD to associate it with each run is less cumbersome than the flow I just described above; basically think of a user looking for Machine ID to run commands on their behalf as part of a pipeline with the exact same set of access that they were already approved for. The flow you described in your comment would mean that they have to pick the resources on behalf of the bot anyway, then find a way to communicate exactly what resources the bot should make its request for - whereas their own access request ID already neatly encapsulates all of that information, they just need a way to delegate the permissions to a runner other than themselves.
The primary use-case in mind here is approval of sensitive CI runs that use Machine ID to access resources protected by Teleport. Our hypothetical user organization has a change management and approval process, with a number of "required" approvers before CI runs that introduce infrastructure changes can be completed. Our hypothetical organization uses a mixture of CI/CD platforms, and whilst some of these have a native mechanism for multi-user approval of CI runs, some of them do not.
Our primary goals should be:
Flow:
tbot
agent within the CI flow assumes this access request when generating certificates and within the generated certificates themselves.Engineering changes required:
tbot
to assume an approved access request when generating credentialsConcerns:
tbot
for it to assume may be a less than smooth user experience.Questions:
There's an interesting proposition in #5081 that a user should be able to cancel their own access requests. This could be leveraged to allow the Bot to "destroy" the access request after the CI has completed successfully.
We probably ought to create an RFD to firm out this design before proceeding with implementation due to potential security impacts.
tbot
itself, and requires no changes to the existing access request mechanism and its security invariants - however - requires a significantly complex new feature build.
What would you like Teleport to do?'
Take a hypothetical flow where a Teleport user requests access to a set of machines to run a deployment. They create an access request in Teleport which is then subsequently approved, then the user is able to assume the request and run SSH commands themselves against these machines.
This is fine if the user can run the deployment commands themselves, but what if they need to run the deployment via a CI/CD pipeline (run using Jenkins in this example)?
The user can request access to Jenkins through Teleport and log in using JWT or SAML auth which could conditionally grant them access to the correct pipeline based on the existence of a certain Teleport role, but the identity used by Jenkins (and the underlying deployment platform) to actually execute the SSH commands would still have to be a generic Machine ID identity which wasn't subject to the same approval process as the user.
As such, it has no context about:
Making
tbot
able to run with an access request ID (preferably the same ID that the user had already been approved for) would keep audit trails associated with Access Requests more consistent throughout Teleport, while still ensuring least privileged access with short-lived certificates.This is effectively allowing Machine ID to "impersonate" a user's access request, or perhaps request approval for the same set of permissions as a given access request, but on its own behalf.
What problem does this solve?
Read the section above.
If a workaround exists, please include it.
The only real workaround currently is to issue a longer-lived identity and put it in some kind of vault or only allow deployments with a given Teleport role, then make users request access to the vault or role before running deployments. This is very much a legacy flow and we could definitely build something way better.