K8SSAND-1533 ⁃ Modular Secrets Backend

What is missing?

The only supported option for secret storage of authentication credentials is through Kubernetes Secrets. It would be great if there was modular support for different/external secret backends (e.g. Vault) that would allow for the current secret storage to work as is, but also provide alternative options if desired.

Why do we need it?

Kuberenetes Secrets are unencrypted and have a list of potential security risks. Introducing a modular secrets backend would allow for users to configure a secret backend storage system that meets their security needs, while still providing the default out-of-the-box option

Environment

All major cloud environments

K8ssandra Operator version:

`k8ssandra-operator:v1.1.1`**Anything else we need to know?**:

The interface would ideally be a drop-in interface within the ReconcileSecret() function k8ssandra-operator/replicated.go at 013df82bd7e4f50c8ee733b0418a9f3807545055 · k8ssandra/k8ssandra-operator which is called during K8ssandraCluster Reconciliation from:

reconcileSuperUserSecret
reconcileReaperSecrets
reconcileMedusaSecrets

These user/pass secrets are mounted as environment variables within the medusa container and reaper container and therefore need to be mounted/injected from the external secret store. If the credentials need to be injected as something other than an environment variable (such as through mounted volume) these secrets should not be created:

Medusa user: k8ssandra-operator/reconcile.go at 013df82bd7e4f50c8ee733b0418a9f3807545055 · k8ssandra/k8ssandra-operator
Reaper JMX auth: k8ssandra-operator/datacenter.go at 013df82bd7e4f50c8ee733b0418a9f3807545055 · k8ssandra/k8ssandra-operator
Reaper auth: k8ssandra-operator/deployment.go at 013df82bd7e4f50c8ee733b0418a9f3807545055 · k8ssandra/k8ssandra-operator
Collection of all auth vars here: k8ssandra-operator/reaper_controller.go at 013df82bd7e4f50c8ee733b0418a9f3807545055 · k8ssandra/k8ssandra-operator
Cassandra auth / JMX superuser role: k8ssandra-operator/auth.go at 0538e957740325ee3f52b8aa4e8c18d69beedd96 · k8ssandra/k8ssandra-operator

CQL users are specified/found in the CassandraDatacenter config by secret reference and also references encryption keys by secrets. The cass-operator would need to be aware of the configured secrets backend and where to retrieve the users/certs from instead.

CQL users are added to the CassandraDatacenter spec via k8s secret reference name in the k8ssandra-operator: k8ssandra-operator/auth.go at 0538e957740325ee3f52b8aa4e8c18d69beedd96 · k8ssandra/k8ssandra-operator .
Cass-operator is responsible for retrieval of each secret and upserts the users into the db: cass-operator/reconcile_racks.go at 942bd735ca59dabc95d3169787ceaeba8d415cc1 · k8ssandra/cass-operator
Server security secret volume mount: cass-operator/security.go at 942bd735ca59dabc95d3169787ceaeba8d415cc1 · k8ssandra/cass-operator
Internode credentials retrieved from secret: cass-operator/secrets.go at 942bd735ca59dabc95d3169787ceaeba8d415cc1 · k8ssandra/cass-operator

┆Issue is synchronized with this Jira Story by Unito

FYI, we're still working a bit on honing this proposal. In particular I think there's some open questions around how to "mount" (or provide access to the secrets within each cassandra pod) the secrets without using k8s-secrets. My own intuition is that we would want to:

Make the operator process aware of the secrets, fetch them at startup and store them in memory.
Replicate the "mounting" machinery that comes built-in to k8s secrets, presumably by doing something like mounting them to an appropriately permissions-restricted file and then using an init-container to load the contents of that file into environment variables. If we take this approach, then the operator can unify the machinery for mounting secrets across all secrets-backends... it would "only" have to support set/fetch on startup which feels like a pretty clean interface as opposed to forcing folks to muck around with init containers ensure the secret is available at a particular env-var which is the only other way I can think of to do this.

Does vault support "watch" APIs for secrets? I'm thinking of credential rotation and I believe that https://www.vaultproject.io/docs/secrets/databases/cassandra supports automatically rotating credentials. If a backend is doing automatic rotation, it may not be enough to fetch the secrets on operator startup, we may need to watch them... update them... and then possibly do a rolling restart on any pods that mount them?

Edit: It looks like as of 2018 Vault did not have the ability to watch a path (without interacting with the storage backend... which I don't think it viable here as vault itself has many modular backends and probably few organizations would want operators mucking about with direct access to them): https://github.com/hashicorp/vault/issues/616

Edit2: I could still imagine there being room for a watch-oriented API in the modular abstraction, even if some secret backends implement that watch via a periodic polling mechanism.

Thanks for creating the issue :)

Kuberenetes Secrets are unencrypted and have a list of potential security risks.

Would it be possible to encrypt the secret contents and then provide the key to consumers of the secret in a secure way? (credit to @jeffbanks for the question 🙂 )

Can you give a high level explanation of how using credentials from an external provider works? I'm struggling to grok this as it's something I haven't done before. Will each container need to make a call to Vault to get the credentials and store them in environment variables?

Replicate the "mounting" machinery that comes built-in to k8s secrets, presumably by doing something like mounting them to an appropriately permissions-restricted file and then using an init-container to load the contents of that file into environment variables.

Would those environment variables be visible to other containers?

Something else to keep keep in mind. All of the secrets under consideration are created in the control plane cluster and replicated to the data plane clusters. The secrets may be used in the local cluster, but they might also be used in remote clusters. By default, any secret created in the namespace that the operator is watching and that has k8ssandra.io/cluster-name and k8ssandra.io/cluster-namespace labels will get replicated to data plane clusters. Check out secret_controller.go.

What we are proposing?

The ability to enable auth, but to disable the creation of secrets that store the auth credentials. In this manner, the operator will need to have 1 fundamental change, which is where it will look to retrieve the user credentials from:
1. The default case, which is retrieving Kubernetes secrets
2. The proposed case, which will retrieve the credentials that have been injected by the user
The ability to define init-containers for all resources within k8ssandra

Why we’re proposing this

As mentioned in the original issue, Kubernetes Secrets do not provide a secure way of storing secrets that meets our current needs. As an organization, we require a secret storage backend that provides us with a fine-grained audit trail that details all requests and responses to the system. Our secret storage system goes through a demanding security review and needs to meet strict compliance standards, which requires a substantial amount of operational overhead to achieve all of these requirements. We can avoid needing to implement the same security standards for Kuberenetes Secrets if we do not engage in using them as a secondary source of secret storage within our systems. Additionally, we require consistent mechanisms and policies around credential rotation. It’s one thing to have these features in the operator, and many unopinionated orgs will appreciate that. But opinionated orgs will want to employ their “standard” rotation policies using their standard rotation mechanisms… and these mechanisms will integrate strongly with previously mention auditability and compliance systems. As such, simply encrypting k8s secrets doesn’t really move the needle on why we’re doing this. It also doesn’t solve the secret-storage problem, as you’re left with a secret-encrypting-secret which has the same problems we had with our original handful of secrets

What it looks like

Config option Similar to the auth flag at the top level K8ssandrCluster spec, there would need to be an additional flag such as externally_injected_secrets, which would default to false. When false the operator will create secrets for all superusers and replicate them across the different clusters just like it does now. When enabled, the operator will not create the secrets for any superuser and it will be the user’s responsibility to provide those secrets.

Where will the secrets be Since the operator will not be creating the secrets, it should expect that the superuser credentials are already available within the operator and should be accessible as needed. This means, there needs to be a default (or configurable) file location where the secrets will be written to on initialization that the operator will have read access to. This means that the operator will be agnostic to where the secrets are retrieved from and instead only requires that they end up in a common location such as /etc/cassandra/superuser_credentials.

Reconciliation Currently if auth is enabled, the operators will generate superuser credentials, if they don’t already exist, and apply them as Kubernetes secrets. The secret names are stored within the CassandraDatacenter spec as a reference that can be used to retrieve the secrets when actually applying the CQL command to create the secrets. Instead, the operator will check the externally injected secrets configuration flag and, if enabled, look to read the secrets from the file. If the secrets do not exist or do not have the expected form within the file, the operator should return an error and re-queue the reconciliation task until the secrets file has been populated. And if the secrets file is available, the operator should continue to set the CQL superuser password to match the provided credential string, as it does today.

Generation/Population Since the credentials are expected to mounted to the operator as a file, the user will require some mechanism to populate this file. By giving users access to init-containers within the operators, a user can use their own custom image that contains the necessary arguments and logic to authenticate and interact with their own custom secrets backend, with the stipulation that the retrieved secrets need to be mounted to the operator as a file in the location the operator is expecting and in the proper format within the file.

Downstream storage The medusa and reaper pods both expect that the superuser credentials will be available locally as environment variables within each of the pods. This allows for the credentials to be loaded into the application configuration at runtime. This should still be expected and the user again will need to customize their init-containers so that the credentials are injected into the pod and populate the requisite environment variables. This is actually a little tricky and may require some additional thought. The proper way to do this with an init-container is to write the secrets to a script that will export the environment variables. The main container would then need an additional command to source the script created by the init-container so that the environment variables are populated. A possible alternative to this would be to mount the k8s secrets as a file to each of the containers instead of as environment variables. In this case, the application level code would require some changes since it would expect the credentials in a file instead of environment variables.

Validation The validation steps that check for the existence of secrets (cass-operator/handler.go at 942bd735ca59dabc95d3169787ceaeba8d415cc1 · k8ssandra/cass-operator ) will need to be disabled so that the CRD specs are considered valid and do not prevent the reconciliation loops from completing successfully.

Known constraints

When secrets are disabled in the configuration, the user will have the take responsibility for replicating the secrets across the Kubernetes clusters. Since the user will be using their own logic to inject the credentials into the pod, they’ll also have the responsibility of provisioning and replicating those secrets as necessary within their secrets management system before deploying the k8ssandra-operator and creating a K8ssandraCluster. Additionally, whatever file/env-var interfaces we expose become public config interfaces and shouldn’t churn unnecessarily across k8ssandra versions.

Additional Proposals

A modular secrets backend was our original intention with this issue, but the implementation of such a solution quickly gets complex and would result in additional configurations for the operator, additional backends to support, and potentially leaky abstractions. This proposal would require the operator to gain the ability to fetch/set secrets in other storage backends. The supported backends would also need to support some form of replication and therefore the operator would need access to replicate those secrets within the storage system. Finally, there would need to be a supported way to mount/inject the secrets that could be used as a drop-in replacement to the current mechanism of mounting the secrets as environment variables without relying on Kubernetes Secrets to do so.

Instead, by allowing the user to selectively disable the creation of secrets, it moves the responsibility of provisioning credentials and injecting those credentials into the pod up to the user instead of the operator. This allows a user to use their external backend of choice and additionally only requires some mechanism to inject those credentials into the operator and dependent pods.

There was some good discord chat about this, which I'll briefly summarize here...

jsanda noted the existence of https://secrets-store-csi-driver.sigs.k8s.io/ which is an interesting prospect for integrating k8s-secret resources with external secret stores. It does have a vault plugin already, but secrets rotation is in alpha. It also leaves unanswered policy questions about write permissions and replication. Having a mode where the operator defers to external credential-policy-enforcing tools is a useful mode of operation.
Miles Garnsey started to discuss threat models and security goals, but after a short discussion we agreed that a better lens was compliance and the balancing act that goes along with integration of trying to retain the benefits of a mature enterprise-preferred secret-store while also gaining the benefits of the operator for things like repairs, backups, and other foundational operations for C*. In this scenario, the all-in-one nature of the operator's credential handling is not helpful. You'd prefer to let the secret-store do it's thing for storage/rotation/etc and simply consume the resulting credentials.
There seems to be some rough consensus developing that disabling replication and creation/rotation of secret values is a useful mode of operation.
- Miles pointed out Vault's mutating webhook controller (https://www.vaultproject.io/docs/platform/k8s/injector) which can inject secrets via a sidecar even if its not specified in the podspec. This might alleviate the need for custom init-container support in the operator (though it sounds like a useful feature to me for a variety of reasons).
- jsanda is still thinking about what the right abstraction is to provide an interface for this work. He's thinking about some kind of CassandraSecretsProviderClass. Steve and I had been envisioning the "mounted" secrets (either in a file at a conventional/configurable location... or in an env var) as the interface... primarily as a consession to the fact that these different components (reaper, medusa, cass-operator, k8ssandra-operator) can all relatively trivially interface with this kind of abstraction. It is not particularly clean... but it's an open question where there exists an equally general but more well specified interface. The problem I see here is plumbing that fetches secrets, replicates them, and mounts them for all the containers that use them is moderately complex and feels like it's likely to get tied up in rotation policies and read/write permissions to the secrets store. In comparison to this, mounting secrets as the interface feels pretty flexible, clean, and simple.

This has evolved into a proposal documented at https://docs.google.com/document/d/1zSwkWhylXMk7mDmjkq4ArvNmwDRs-CEjRuYu42KcXg8/edit.

The crux of the proposal is that we'll introduce the notion of an "internal secrets provider" that retains all our current secrets-management behaviors, and we'll introduce an "external secrets provider" that has a much higher barrier of entry (requiring you to handle your own secrets creation/rotation... and requiring you to use the vault-agent or create your own mutating webhook to inject secrets from whatever enterprise-secret-store you're using) and leverages kubernetes dynamic-admission-control/mutating webhooks to inject the secrets into the contains that need them in a way that is mostly transparent to the operator... which just needs to annotate the various containers with metadata that the mutating webhook can read to know what to inject. This is all significantly inspired by the Hashicorp Vault Agent for Kubernetes.

This will involve changes to k8ssandra, reaper, medusa, and possible cass-operator and probably there are some small design/interface details to be worked out as we start learning from implementation... but we have the broad-shape of the work needed planned out.

Some of the folks working on this will be first-time contributors, but all quite familiar with Cassandra and Kubernetes so hopefully they'll be able to make good independent progress.

Thanks for the update @mikelococo !

FTR, work on this will be tracked in this epic.

k8ssandra / k8ssandra-operator