hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.92k stars 4.18k forks source link

Kubernetes backend #5097

Closed ccojocar closed 3 years ago

ccojocar commented 6 years ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

It would be nice to have a Kubernetes backend which uses a Secret/CRD per Vault item to store the encrypted data.

One project which already has such a backend is dex: https://github.com/coreos/dex/tree/master/storage/kubernetes.

Describe alternatives you've considered

Explain any additional use-cases

This would make vault operation easier in any Kubernetes cluster independent of the cloud provider.

Additional context

jefferai commented 6 years ago

What is a CRD? What problem is this solving? How is Vault not independent of the cloud provider currently in a way that is meaningful here? (People run Vault on Kubernetes on many cloud providers, so I don't really understand what you're getting at.)

ccojocar commented 6 years ago

@jefferai You can find more details about a Custom Resource Definition here: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/.

It's true that is possible to run Vault on Kubernetes in a independent way, but you need to either use the file backend(not scalable) or install a separated consul/etcd/DB service for storage. By using a CRD, Vault could store data directly into the Kubernetes etcd cluster through Kubernetes API, in the same fashion like a Secret is stored. The advantage is that there is not addition service required for storage, and the data still gets encrypted by Vault before being stored.

cc @jstrachan

jefferai commented 6 years ago

I assume there is no notion of locks so HA wouldn't be supported. Given that, what's the advantage over using the file backend?

ccojocar commented 6 years ago

The etcd cluster is HA with locks. It stores the entire state of the Kubernetes cluster. You should see it like a REST API on top of etcd, managed by the Kubernetes API server.

ccojocar commented 6 years ago

I am willing to contribute this backend if you agree with it.

jefferai commented 6 years ago

My understanding is that access to the system etcd is not available in Kubernetes. Can you access all etcd primitives wihtin a CRD?

ccojocar commented 6 years ago

You can't. You define the schema of a REST resource used by Vault, which is register with the API server, and the API server will mediate all the access to the etcd cluster. It will basically store this resource into etcd and it will ensure its consistency.

jefferai commented 6 years ago

Right, but from the API that you register, it can access etcd directly? If not, how do you propose to do locks for HA?

ccojocar commented 6 years ago

The API server will handle this, when an update is posted in the API.

jefferai commented 6 years ago

I still don't follow. Locks in most kv stores work on sessions -- either you make an attempt to grab a lock that returns immediately with success or not, and you just loop over and over trying, or you make an attempt that blocks until you receive the lock. Then you need to actually manage the lifecycle of the lock. Maybe this is possible with a CRD, I'm just trying to understand details.

Note that HA isn't mandatory for a storage backend. I don't have any clue how a CRD approach would scale, but if it scales fine I don't see why it wouldn't work in a non-HA scenario. I just also don't understand your proposal in terms of how HA would work.

tamalsaha commented 6 years ago

As someone who is interested in both Kubernetes and Vault, I am going to give my unsolicited 2c.

It would be nice to have a Kubernetes backend which uses a Secret/CRD per Vault item to store the encrypted data.

I have to disagree. Vault should not use Secret/CRD as storage. This feels like an anti-pattern. Vault should be in charge of keeping the data secret. If Secret/CRD is used as Vault storage that will create a cycle.

jefferai commented 6 years ago

We have definitely heard requests in the past for Vault to be the backing store for Kubernetes secrets (a lot) -- so most requests I've heard do in fact run that direction.

That said, if someone is never interested in backing Kubernetes secrets with vault it's possibly doable.

It's not clear to me if CRD necessarily means storing in Kubernetes secrets vs. simply storing in Kubernetes storage? If the latter it's basically just a passthrough to etcd right?

jefferai commented 6 years ago

Maybe in other words, storing in Kubernetes storage doesn't seem all that bad for those that want to do so, storing in Kubernetes secrets seems odd.

ccojocar commented 6 years ago

I have to disagree. Vault should not use Secret/CRD as storage. This feels like an anti-pattern. Vault >should be in charge of keeping the data secret. If Secret/CRD is used as Vault storage that will create >a cycle.

@tamalsaha This is not an anti-pattern. There are many projects which make use of it. I attached the dex backend in the description as an example - you can have a look.

I agree that using Secrets, it's confusing and feels a bit odd. In fact the data from the secret can not be used without being decrypted by Vault.

I would prefer a CRD to have a clear separation between Vault data and typical Kubernetes secrets. Also another advantage of a CRD is that it can have a dedicated Kubernetes RBAC policy which will be different from one used by Secrets.

ccojocar commented 6 years ago

I still don't follow. Locks in most kv stores work on sessions -- either you make an attempt to grab a lock that returns immediately with success or not, and you just loop over and over trying, or you make >an attempt that blocks until you receive the lock. Then you need to actually manage the lifecycle of >the lock. Maybe this is possible with a CRD, I'm just trying to understand details.

@jefferai I am trying to point you to the code which interacts with etcd in api server. You can see there more details how the transactions are handled: https://github.com/kubernetes/kubernetes/blob/2bb1e7581544b9bd059eafe6ac29775332e5a1d6/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L132

jefferai commented 6 years ago

I undertsand how Kubernetes interacts with etcd. I don't understand how CRDs interact with etcd.

ccojocar commented 6 years ago

CRD is just an API extension, which is stored in etcd in the same fashion like other standard Kubernetes resources.

tamalsaha commented 6 years ago

It's not clear to me if CRD necessarily means storing in Kubernetes secrets vs. simply storing in Kubernetes storage? If the latter it's basically just a passthrough to etcd right?

Maybe in other words, storing in Kubernetes storage doesn't seem all that bad for those that want to do so, storing in Kubernetes secrets seems odd.

I undertsand how Kubernetes interacts with etcd. I don't understand how CRDs interact with etcd.

Kubernetes defines some interfaces for storing data. Current implementation for these storage interfaces use etcd. CRD (also secrets) is a Kubernetes resource that uses these storage interfaces to store data in etcd. You can think of CRD, Secrets or any Kubernetes resource like Pod, etc as a different Class in OOP sense. At rest, they all get saved in etcd via storage interfaces by the kube apiserver.

While I agree that it should be possible to use CRD as a storage backend for Vault, here are some things to consider:

  1. Kubernetes imposes a maximum limit of 1MB per crd. If Vault ever wants to write a file larger than 1 MB, it will fail.

  2. I personally want to use Vault because I want Vault secrets to live beyond a cluster. To backup cluster, most people use some form for Gitops using charts or store the yamls directly. Using CRD as Vault storage does not fit into that model. Users have to storage the Vault CRD yamls as is.

  3. CRDs are stored as json serialized bytes in the etcd. This can cause high cpu usage by kube-apiserver. This is why Kubernetes moved the core objects to protobuf serialized format some release ago.

  4. I think a better approach is to just use a cloud bucket with HA storage implemented using Kubernetes. https://github.com/hashicorp/vault/issues/4951 . This gives the best of both worlds.

  5. Also, any Vault server that is using CRD as backend will be weird for other traditional Vault use-cases due to cyclic dependency. For example, you use Vault to store Kubernetes secret and then Vault stores them as CRD in the same Kubernetes. User could just use the secrets directly.

All of these make me think that a CRD based backend for Vault will be only suitable for demo purposes. In that case, one can just use filesystem backend.

ccojocar commented 6 years ago

Thanks to bringing these points up. Probably 1MB is the most concerning point, but the vast majority of secrets will be smaller than this. This limit is imposed also for standard Kubernetes Secrets. That being said, you won't be able to store large secrets anyhow.

The other arguments, I don't quite get. You are not forced to use this backend, if does not fullfill your requirements. You are completely free to pick up another backend which stores the data outside of the cluster. Also you can use a dedicated etcd backend if you have concerns with the data size.

Personally, I think that a lot of people are interested to have a in-cluster backend.

2opremio commented 5 years ago

I understand it would be more desirable to have an official solution, but I think this project may already cover the needs stated in this issue https://github.com/DaspawnW/vault-crd

raskchanky commented 3 years ago

As there is a community project that provides this already (and it looks like it's under active development), I'm going to close this for now.