zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.35k stars 980 forks source link

Insecure and inflexible management of S3 credentials for logical backups #1247

Open ZacharyJoswick opened 3 years ago

ZacharyJoswick commented 3 years ago

Good day Zolando team. First off, thank you for all the hard work on this tool. Its a valuable addition to our infrastructure and I truly see it as the way forward.

I'm working on setting up the postgres operator on our bare metal cluster with the intent to use it for production workloads assuming our trial run works out. In doing so I am working through the steps to setup logical backups as this is a very valuable feature for us and one of the main motivations for switching to using the operator in comparison to our current database management system.

However, when looking through the installation documentation and a number of related GitHub issues, I was a bit surprised by the methods in which logical backup s3 credentials are managed. From what I understand, the only real way to configure the operator to deploy the logical backup cron job with s3 credentials is to provide them (along with the endpoint and region) in the values file for the helm chart (at least for the helm chart installation method). This seems like a significant security vulnerability as these credentials will be stored in the operator configuration crd and thus not be subject to any real access control as compared to secrets (unless those policies are implemented, which is an additional step). This was partially brought up in #915 , however it looks like only the WAL archive configuration was changed (to allow the use of secrets), and the logical backup aspect was left as is.

Additionally, this system of providing the key at install time is troublesome as our company keeps all cluster configuration stored in git, meaning that these credentials would need to be saved in a values file and committed to the repo, adding an additional layer of insecurity. Furthermore, we use minio as our s3 compatible storage and deploy instances using a minio operator, meaning the credentials to the minio instance are not known until provisioned, and even then are only kept in a secret which is mounted by other pods if necessary. And this method is even seen as a last resort as we attempt to use temporary credentials wherever possible. All of this is to say that with the current method of providing credentials we are required to deploy the minio instance, manually get the credentials out of the secret, manually put them in the values file for the operator (along with the endpoint and such), then deploy the operator, which is in general an undesirable workflow and full of potential security vulnerabilities. Not to mention that it now introduces significant manual steps to cluster setup in an otherwise completely automated process (which prevents standing up replica clusters or temporary test clusters).

Now there are some potential solutions we are toying with to work around this limitation. One method we looked at was using hashicorp vault, storing the minio credentials there and using the banzicloud bank-vaults secret injector to modify the operator crd when its installed and inject the credentials. This however still requires using pre-generated credentials (as they have to be put into vault somehow) and doesn't solve the access problems as the operator config CRD still keeps the credentials in plain text. But this solution seems like a bit of a hack and overall I think its a substantial amount of extra moving pieces for something that in theory could be handled much more elegantly.

So my question is; What is the best approach to fixing this issue? I'm more than happy to make changes and PR them, but there are a few possible strategies to resolving this as as I see it (and I'm sure others have ideas as well). One potential solution would be to simply allow these values to come from a secret (similar to the solution added in #946 for wal files). This seems simple enough to implement, but it could be useful to provide different logical backup configurations to each cluster (similar to the changes proposed in #1239 ) as you could then change which bucket or endpoint you want a cluster to use for backups. It would be even better if we could configure wal, logs, and logical backups on a per cluster basis (maybe all in one place) and specify which credentials / endpoint to use so that individual teams can have their own isolated storage and postgres cluster without having to deploy multiple operators or result to other hacks like that.

So what do you all think? Am I just missing where things can be configured easily or is this a reasonable change? I see this as truly making the operator a multi team system and a change that could dramatically increase security and flexibility as a whole, but I'm open to ideas and suggestions.

Jan-M commented 3 years ago

Hi, quick reply: You are right with most of the things pointed out around logical backup credentials, and ideally noone should use it like this. We also do not, as we rely in AWS IAM and pod roles.

We should def. encourage the use of K8s secretes everywhere where possible, passwords should not be in config maps.

I think integrating with anything beyond K8s api and K8s secrets is beyond the scope, that is something other tooling, deployment pipelines and automation should take care of.

Will give it a more detailed read again later.

jabbors commented 3 years ago

I'd also like to see an improvement here.

It would be great if the secrets could be integrated with vault since we rely on vault for 90% of our workloads. We use bank-vaults secret injector for this.

I've come across some projects that instead of specifying the secret directly in the CRD or in the pod spec, provides options to specify the name of the secret holding the sensitive data. That is a good first step although still not a good option from a security perspective.

michaeljguarino commented 3 years ago

One way forward here should be to allow users to connect to EKS and GKE's new workload identity systems. That should be achievable by optionally allowing cluster pods to specify existing service account names, which will be annotated and provisioned appropriately outside the scope of the operator.

I also suspect this should be a relatively easy addition.

ErikLundJensen commented 2 years ago

The secret handing for S3SecretAccessKey could be changed to look like the PGPASSWORD configuration here: https://github.com/zalando/postgres-operator/blob/1c80ac0acd4fb15432e46d8dadac6f1bf4817d31/pkg/cluster/k8sres.go#L2180 however, with the little twist that the env should only be set if a new configuration parameter (e.g.) S3SecretAcessKeyName is set.