spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.78k stars 472 forks source link

Agentless API #986

Closed omerlh closed 3 years ago

omerlh commented 5 years ago

One of the challenges of using SPIFFE is workload running on platforms where installing an agent is almost impossible. This could be a non-docker deployment (legacy, AWS BeanStalk, Azure AppService) where installing an agent could be very complex or FaaS where it's impossible. I'm proposing the following changes to the protocol to support such deployments: First, add "agentless" API in the following format:

POST api/v1/token
Header: Authorization <token>
Body:
{
  "additional_claims": {}
}

The <token> is provided by the platform where the code is running - AWS credentials from the metadata endpoint, Azure access token, Kubernetes service account token, etc. The additional claims will be added to the issued JWT. The server will validate the token with the issuing platform, and if valid it will issue a new JWT token with SPIFFE format based on the original token. This endpoint could easily be consumed by any application exist, with a small modification (adding a post request) and make it a lot easier to move into SPIFFE. The second is exposing something like OpenID connect metadata endpoint, for exposing the keys used to validate the issued JWT. Consumers can use this endpoint when verifying the token, making it easier for everyone to validate token leveraging existing implementations for JWT token validation.

mbrancato commented 5 years ago

Thanks for officially opening this @omerlh. As I've mentioned on slack, my interest in this is because the workload API talks to the node, which is a concept I think that might be less common going forward. This node dependency is illustrated by Scytale in a blog post:

To justify the need here, there are already a number of "nodeless" environments, most notably the virtual kubelet. This is the used for both Azure ACI and Amazon Fargate connections to Kubernetes without any nodes. I agree, that moving forward there will be a need to rely on platform issued tokens to perform the initial authorization of the workload. On Kubernetes, the service account token makes sense. On other platforms, they may have other tokens. The API may need to identify what type of token is being used or who the issuer is. The server could then use platform APIs to verify the token.

evan2645 commented 5 years ago

Hey folks - sorry for the delay here.

SPIFFE's SIG-Spec discussed this in depth during the last call. There are a few things to unpack here.

The first is that the proposed API changes are changes to an API that isn't actually defined by SPIFFE - instead, it's an API that is specific to the SPIRE implementation. We have in the past discussed pulling this API into the SPIFFE spec set, and decided not to because that API isn't material to the identity interop and portability visions that SPIFFE is trying to achieve. We re-visited this decision as a result of this issue and felt that the situation is still much the same.

We still think that this is an important use case, and very much want to support agentless deployments, however we think that it is more likely to be a feature of SPIRE than it is to be a feature of SPIFFE itself.

The second thing to be unpacked is how well a credential exchange API solves the agentless problem. The virtual kubelet pattern is an attractive one, and could perhaps be a better fit here.

One thing to consider is the performance and availability impact of a centralized credential exchange API. Uptime of the API endpoint(s) is critical, and responses must be very rapid in order to not dilute the value proposition of the various serverless techs that will be consuming it. Further, the workload is required to obtain the credential that it must exchange, so there are several steps between boot and "I can do work now", all of which must be available and must be performant.

@mcpherrinm suggested something similar to the virtual kubelet pattern, where an agent (or something like it) can get SVIDs issued, and manage their lifecycle, while persisting them into platform-specific stores. This would allow platforms with serverless/agentless offerings to manage the delivery of SVIDs as part of boot. The way that the SVIDs are persisted and manage could be pluggable, or easily extendable.

Curious to hear your thoughts on the above approach. In the meantime, I'm going to move this issue to the SPIRE repo. Thanks again for raising this!

evan2645 commented 5 years ago

@mcpherrinm suggested something similar to the virtual kubelet pattern, where an agent (or something like it) can get SVIDs issued, and manage their lifecycle, while persisting them into platform-specific stores. This would allow platforms with serverless/agentless offerings to manage the delivery of SVIDs as part of boot. The way that the SVIDs are persisted and manage could be pluggable, or easily extendable.

To put a bit of a finer point on this - the Workload API is meant to solve the "secret zero" problem, or the initial introduction of identity. The Node API is meant to facilitate agent functionality. The problem we want to solve in agentless (AFAICT) is the former and not the latter, so we should explore ways to introduce/inject SPIFFE credentials into agentless/serverless workloads. Since these platforms all have different ways of doing things, I think it makes sense to leverage platform-specific mechanisms to accomplish the injection of SPIFFE identities. Note that unlike a credential exchange API, this negates the need for the workload to authenticate either itself or the API it is calling.

omerlh commented 5 years ago

So you're basically suggesting a secret distribution solution (SDS)? This requires some thoughts, but I'm not sure this is possible. I would suggest taking a practical use case (e.g. PaaS offerings like AWS BeanStalk or Azure AppService or FaaS offering like Azure function or AWS lambda) and try to imagine how such a flow will work. I hear what you're saying about reliability and performance, and this is indeed a problem - especially when using HSM to sign the tokens (which makes things only slower). I just think a centralized solution might be feasible and support all platforms which might not be the case for SDS.

evan2645 commented 3 years ago

Work on this is well underway, and is being tracked in https://github.com/spiffe/spire/issues/1843

Community feedback has steered us towards a push model for the first pass, but I think in the fullness of time we'll also want to support something centralized as is described here. It's unclear exactly what that would look like (e.g. "direct attestation" of workloads, or credential exchange API, or both?). The majority of the conversation is occurring in the RFC though, so going to close this out in favor of tracking there.