Spire Topology question, with a large number of CSP accounts

nstott commented 1 year ago

Version: any
Platform: linux / any
Subsystem: node attestors / general topology

Hi All,

I'm exploring spire for a use case that we have, and can't seem to find a way to phrase what we need with the existing node attestors.

we provide a service for our customers and provision and manage infrastructure inside a customer's cloud account.
Currently this is AWS and Azure specific, but will include GCP in a few months. When customers onboard onto our service, we provision instances within the customer's account, and then run workloads on those instances. we then want to use the workload identity of those 'foreign' services against services running in our own account

we deploy our infrastructure on kubernetes clusters inside our own accounts and on kube clusters within the customer accounts.

we work with a reasonably large number of customer accounts, greater than 1k. Customers can add and remove new CSP accounts at runtime. the account list is not known at spire-server provision time

There are a few constraints that we have,

we do not want to allow ingress traffic into these customer accounts.
we don't want to have to manage customer credentials outside the customer account. so managing a large number of aws /azure / gcp creds is awkward
each csp account is owned by a single customer, there's no multitenant
customers have access to their own account, and can do nefarious things if they so desire. change instance tags, exfiltrate secrets, etc

It's essential that one customer cannot impersonate another customer. But it's probably not catastrophic if a customer makes a random workload that runs within their own account.

We've explored a few different topologies. so far we've looked at:

Federated spire

having each set of customer workloads use a different trust domain. this is made difficult by the fact that we don't want to allow any ingress traffic into the customer's account.

Nested spire

we've tried having a single root spire in our own account, and then having nested servers running in the kubes in the customer accounts. each nested spire-server could use something like an x509pop to attest that workload.
This seems complicated and it relies on us being able to pre-share some sort of token/certificate

Vanila spire

with spire-servers sitting in our own accounts, and attesting to node identity in customer's accounts. this seems ..mostly.. reasonable, I think we can use ONLY the aws IID documents (or gcp iit, or azure jwt tokens) to attest to the agent's cloud account, but that doesn't seem possible with the current csp-flavoured node attestors.

Vanilla spire with custom node attestors.

I've thought about writing some custom attestors that will only use the IID / JWT type documents for attestation, and then generating selectors that will filter by account/subscriptio id. But I'm not sure if this is the best approach. If we went down that road, we would probably want to provide additional information that isn't contained within these documents, like k8s id, or region id, or instance tags.

Does anyone have any recommendations on a topology that would work for us, or are we too far from the beaten track?

amartinezfayo commented 1 year ago

Thank you @nstott for opening this issue. I think that the first step to explore possible solutions in your scenario is to scope the problem, identifying how SPIRE can help you. As you know, SPIRE is designed to help with the challenge of issuing workload identities at scale in a secure way (to deliver mutual authentication), so the focus should be on what's the shape of that problem in your scenario. I believe that you summarize it here:

When customers onboard onto our service, we provision instances within the customer's account, and then run workloads on those instances. we then want to use the workload identity of those 'foreign' services against services running in our own account.

What I'm reading from that is that you would like SPIRE to issue the identities of workloads that are owned by your service and also workloads that are owned by the customers (that run inside provisioned instances within the customer's account). I'm assuming that you are looking for a way to have mutual authentication between those workloads. Is this correct?

amartinezfayo commented 1 year ago

I'm closing this out due to inactivity, please re-open if needed.

spiffe / spire