reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Allow kubegres cluster to run on secure Kubernetes environments (Pod security policies) #52

Closed teknologista closed 3 years ago

teknologista commented 3 years ago

Hi,

First thanks for your work on a Postgres cluster Kubernetes Operator.

We are deploying Kubernetes clusters in a secure by design manner using Rancher's RKE2 (aka RKE Government v1.20.11+rke2r2)

This creates a cluster with a hardened Pod Security policy which forbids, among others pods, from running as root.

This implies the workloads must have security context defined with at least these to settings (1001 is the postgres image running user):

securityContext:
  - runAsNonRoot: true
  - runAsUser: 1001

Looking at the baseConfigMap I also see you do some chown postgres:postgres when copying data from primary to replicas. I guess these would fail with these settings.

In "enterprise" setups this is actually not needed as CSI provisioned Pas do belong to the pod running user AFAIK.

While I understand this level of security is not needed by everyone and some users want to be able to run things in smaller clusters where security is not mandatory, I was wondering if it was possible to have some boolean flag in the CR yaml (ie: hardened : true/false) that would allow the workload to run in hardened PSP clusters.

This flag would basically use an alternate baseConfigMap with no chown commands and the added securityContext to the statefulSet.

FYI, right now we run a single node not replicated postgres server with the above securityContext with no issues whatsoever.

Let me know what you think about this.

many thanks,

Eric

alex-arica commented 3 years ago

Thank you for your message. It is a great suggestion.

In the following 3 scripts, what would like us to change if we add and enable the flag "hardened"?

1) https://github.com/reactive-tech/kubegres/blob/main/controllers/spec/template/yaml/BaseConfigMapTemplate.yaml#L172

2) https://github.com/reactive-tech/kubegres/blob/main/controllers/spec/template/yaml/BaseConfigMapTemplate.yaml#L196

3) https://github.com/reactive-tech/kubegres/blob/main/controllers/spec/template/yaml/BaseConfigMapTemplate.yaml#L225

teknologista commented 3 years ago

Hi @alex-arica ,

Apart from adding the

securityContext:
  - runAsNonRoot: true
  - runAsUser: 1001

part in the statefulSet, I think we could try just removing the

chown -R postgres:postgres $PGDATA; part in

copy_primary_data_to_replica.sh

I have the feeling that may be enough and I am willing to test a operator test-release to validate the setting as we have a staging RKE2 government hardened cluster ready to validate the change.

Let me know...

Thanks again for your quick response!

alex-arica commented 3 years ago

Thank you for the details that you have provided.

I added this feature in the active backlog.

In regards to the features requested by the Open Source community, I dedicate one continuous week per month to Kubegres. It is usually the last week of each month. However, for companies paying support, we implement new features as soon as possible.

Consequently, I will be able to look at this feature from the 25 October. If meanwhile there is someone else in the community willing to help with this change, I can review a PR as soon as possible.

ylck commented 3 years ago

Hi,

I think I can update this function CR YAML does it look like this?

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default

spec:
   replicas: 3
   image: postgres:13.2
   hardened : true/false
   database:
      size: 200Mi

Also chown -R postgres:postgres $PGDATA; needs to be changed to

if [ $UID == 0 ] 
then
chown -R postgres:postgres $PGDATA; 
fi

@alex-arica @teknologista

alex-arica commented 3 years ago

@ylck thank you for your message. At this stage I am not fully sure because I have not done any analysis about that feature.

I think we need to add the field "securityContext" and the condition that you suggested in the bash script. Perhaps there are more to do.

I prefer adding a new field "securityContext" rather than adding a new field "hardened". It would provide more flexibility.

teknologista commented 3 years ago

@ylck @alex-arica , adding the full securityContext is even better because it allows further customisations. As per analysis I am willing to test the PR on our RKE2 hardened clusters Many thanks again!

alex-arica commented 3 years ago

Thank you for suggesting your help.

Once the changes are available in a beta version, we will kindly ask you to test it.

ylck commented 3 years ago

@ylck thank you for your message. At this stage I am not fully sure because I have not done any analysis about that feature.

I think we need to add the field "securityContext" and the condition that you suggested in the bash script. Perhaps there are more to do.

I prefer adding a new field "securityContext" rather than adding a new field "hardened". It would provide more flexibility.

Ok understand. I also saw issue #58 Configure SSL , so I also thought it would be easier to use securityContext. I will further modify the

alex-arica commented 3 years ago

@ylck It seems like your last message has missing text. The last sentence I can see is " I will further modify the ..." and there is nothing else.

@teknologista I released a "beta" version in the main branch with the features that you suggested. To install it and test it please run:

kubectl apply -f  https://raw.githubusercontent.com/reactive-tech/kubegres/main/kubegres.yaml

I tested it locally with the following config in the YAML:

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default
spec:
...
 securityContext:
   runAsNonRoot: true
   runAsUser: 999
...

Please let me know if it works for you. Once you confirmed it, I will release a new version of Kubegres.

ylck commented 3 years ago

@alex-arica Run and fix test case took a long time, if I need to submit a test case

alex-arica commented 3 years ago

@ylck It seems like your last message had missing text. I can see is: " I will further modify the ..." and then nothing else. I think your last message did not post correctly.

If you wrote test cases, thank you for your help. If you would like me to review them, please create a PR and I will review them ASAP.

ylck commented 3 years ago

@alex-arica ok ,SecurityContextSpecEnforcer.go Is this file still needed? I see that you only configured it when CreatorFromTemplate.go was created

alex-arica commented 3 years ago

Yes the file SecurityContextSpecEnforcer.go is required. I did not implement it yet. If you did that's great! It will help. Thank you :)

alex-arica commented 3 years ago

@ylck Thank you for the additional code changes that you submitted. I merged them into the main branch.

alex-arica commented 3 years ago

@teknologista Do you think you could help by testing this change today? I am planning to release it Wednesday.

I released a "beta" version in the main branch with the features that you suggested. To install it and test it please run:

kubectl apply -f  https://raw.githubusercontent.com/reactive-tech/kubegres/main/kubegres.yaml

I tested it locally with the following config in the YAML:

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default
spec:
...
 securityContext:
   runAsNonRoot: true
   runAsUser: 999
...

Please let me know if it works for you with your parameters. Once you confirmed it, I will release a new version of Kubegres.

teknologista commented 3 years ago

@alex-arica Sure! Will do.

I'll give you my feedback here later today.

Thanks again!

alex-arica commented 3 years ago

Thank you for your help.

teknologista commented 3 years ago

@alex-arica works like a charm. Successfully deployed on an RKE2 Govt cluster with PSP restricted enabled.

I also restored a pg_dump and check that our Keycloak cluster runs great agains Kubegres postgresql cluster.

For me it's perfect!

Thanks again !

teknologista commented 3 years ago

Only weird thing I get is in the replicas initContainers:

02/11/2021 10:52:49 - Attempting to copy Primary DB to Replica DB...
02/11/2021 11:52:49 ls: cannot access '/var/lib/postgresql/data/pgdata': No such file or directory
02/11/2021 11:52:49 02/11/2021 10:52:49 - Copying Primary DB to Replica DB folder: /var/lib/postgresql/data/pgdata
02/11/2021 11:52:49 02/11/2021 10:52:49 - Running: pg_basebackup -R -h keycloak-postgresql -D /var/lib/postgresql/data/pgdata -P -U replication;
02/11/2021 11:52:49 waiting for checkpoint
02/11/2021 11:52:49     0/24326 kB (0%), 0/1 tablespace
02/11/2021 11:52:49 24335/24335 kB (100%), 0/1 tablespace
02/11/2021 11:52:49 24335/24335 kB (100%), 1/1 tablespace
02/11/2021 11:52:50 02/11/2021 10:52:49 - Copy completed

ls: cannot access '/var/lib/postgresql/data/pgdata': No such file or directory

But it is more cosmetic than an actual real issue as I checked inside the pod and /var/lib/postgresql/data/pgdata does indeed exist and has the data in it.

Cheers!

alex-arica commented 3 years ago

Thank you for your help, much appreciated.

Are you able to connect to that replica and run SQL queries?

teknologista commented 3 years ago

Yes I am! Everything works fine. I even tested with pgpool2 in front to load balance queries with read queries to replicas and master and write queries to master.

Seems to work great. I have installed a Prometheus/Grafana dashboard and pgexporter and will monitor.

But so far so good!

Thanks again.

alex-arica commented 3 years ago

Thank you for your help today!

I will let you know once the new version of Kubegres with this change is available.

teknologista commented 3 years ago

No worries @alex-arica ! my pleasure. thank you for implementing this so quickly!

alex-arica commented 3 years ago

Kubegres version 1.13 is available with the changes that we discussed about in this issue.

Please see the release page: https://github.com/reactive-tech/kubegres/releases/tag/v1.13

I updated the documentation by adding details about the new field 'securityContext': https://www.kubegres.io/doc/properties-explained.html

Thank you @teknologista and @ylck for your help!

To install Kubegres 1.13, please run:

kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.13/kubegres.yaml

I am closing this issue.