reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Add support for Container SecurityContext to enable Kubegres to run in pod security standard enforced namespace #178

Closed CasperGN closed 7 months ago

CasperGN commented 7 months ago

Related to #176 with implementation of the container SecurityContext.

Opposed to the proposed structure

container:
  securityContext:
    ..

I've opted for using containerSecurityContext to not give any confusion to what can be inputted into the container spec field:

..
containerSecurityContext:
  privileged: false
  ..

I wondered whether to rename securityContext to podSecurityContext since that is the proper naming for it but instead of breaking the current codebase I opted for naming the spec containerSecurityContext.

CasperGN commented 7 months ago

Fixes #176

alex-arica commented 7 months ago

Hi, could you please do me a favour? Could you please update your PR with the latest master code of Kubegres? I approved another PR recently and I would like review your PR with the latest changes in master. Once it is available, I can review your PR today.

CasperGN commented 7 months ago

Hi @alex-arica,

Just updated with the changes from main and should be ready for your review. Thanks!

alex-arica commented 7 months ago

I will build a new version of Kubegres tomorrow and release it.

CasperGN commented 7 months ago

Awesome. Glad I could contribute!

alex-arica commented 7 months ago

@CasperGN I observed a runtime error in the test SpeccontainerSecurityContextTest:

This happens with the test case: "START OF: Test 'GIVEN new Kubegres is created without spec 'securityContext' and with spec 'replica' set to 3'"

The error is in the primary POD which is unable to be deployed: "Error: container has runAsNonRoot and image will run as root (pod: "my-kubegres-1-0_default(2c79ce83-9257-4afb-ad18-8f936b398b14)", container: my-kubegres-1) "

I believe this is because of this line in the Test: RunAsNonRoot: pointer.BoolPtr(true)

You might need to set the RunAsUser, RunAsGroup from the container security context and FSGroup from the pod security context.

I suppose you did not run this test locally?

alex-arica commented 7 months ago

Would you like me to fix it?

CasperGN commented 7 months ago

Hi @alex-arica,

I'm re-running the tests however,

"START OF: Test 'GIVEN new Kubegres is created without spec 'securityContext' and with spec 'replica' set to 3'"

Indicates more that it is within the internal/test/spec_securityContext_test.go:

(⎈|kind-kubegres:default)➜  kubegres git:(main) ✗ grep -iRe "Kubegres is created without spec 'securityContext'" *
internal/test/spec_securityContext_test.go: Context("GIVEN new Kubegres is created without spec 'securityContext' and with spec 'replica' set to 3", func() {
internal/test/spec_securityContext_test.go:         log.Print("START OF: Test 'GIVEN new Kubegres is created without spec 'securityContext' and with spec 'replica' set to 3'")
internal/test/spec_securityContext_test.go:         log.Print("END OF: Test 'GIVEN new Kubegres is created without spec 'securityContext' and with spec 'replica' set to 3'")

All of the tests added by this PR is done with "GIVEN new Kubegres is created without spec 'containerSecurityContext' ...

I'm running the test cases with DbHost = "my-kubegres.default.svc.cluster.local".

alex-arica commented 7 months ago

I suggest that we run only spec_containerSecurityContext_test.go which is the test file that you added.

What I invite you to do is to run your test file in isolation.

Please uncomment this line in all existing test files: Skip("Temporarily skipping test")

This will allow all tests to be skipped.

Then make sure in your test file spec_containerSecurityContext_test.go this is commented: //Skip("Temporarily skipping test")

And run make test

This is how I managed to reproduce the error.

CasperGN commented 7 months ago

@alex-arica,

I see the same as you now. My bad!

I believe it is line 266:

    Expect(r.kubegresResource.Spec.SecurityContext).Should(Equal(emptyResources))

Should have been:

    Expect(r.kubegresResource.Spec.ContainerSecurityContext).Should(Equal(emptyResources))

I'm having some trouble with in-cluster networking which may be Mac related which prevents me on running the tests properly. I'm terribly sorry about this.

alex-arica commented 7 months ago

No worries. I will test it with the changes that you suggested and let you know.

CasperGN commented 7 months ago

@alex-arica pushed changes to #179 - sorry for this slight mess-up :-)

alex-arica commented 7 months ago

Changes available with Kubegres 1.18

CasperGN commented 7 months ago

@alex-arica I just deployed this to one of our Dev environments on the enforced pod security standard setup and everything looks great:

(⎈|midgard-aks-dev-n9zz4-7qzmn-admin:default)➜  ~ kubectl describe ns heimdall
Name:         heimdall
Labels:       kubernetes.io/metadata.name=heimdall
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=latest
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.
(⎈|midgard-aks-dev-n9zz4-7qzmn-admin:default)➜  ~ kubectl describe kubegres --namespace heimdall
Name:         midgard-postgres
Namespace:    heimdall
Labels:       app.kubernetes.io/instance=midgard
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=midgard
              app.kubernetes.io/version=1.2.0
              helm.sh/chart=midgard-1.2.0
Annotations:  meta.helm.sh/release-name: midgard
              meta.helm.sh/release-namespace: heimdall
API Version:  kubegres.reactive-tech.io/v1
Kind:         Kubegres
Metadata:
  Creation Timestamp:  2024-03-21T20:42:52Z
  Generation:          2
  Resource Version:    46406815
  UID:                 3178c7e1-e356-4e34-8dda-6b6387524aa2
Spec:
  Backup:
  Container Security Context:
    Allow Privilege Escalation:  false
    Capabilities:
      Drop:
        ALL
    Privileged:                 false
    Read Only Root Filesystem:  true
    Run As Non Root:            true
    Seccomp Profile:
      Type:       RuntimeDefault
  Custom Config:  base-kubegres-config
  Database:
    Size:                4Gi
    Storage Class Name:  midgard-postgres-retain
    Volume Mount:        /var/lib/postgresql/data
  Env:
    Name:  POSTGRES_USER
    Value From:
      Secret Key Ref:
        Key:   POSTGRES_USER
        Name:  midgard-postgres-secrets
    Name:      POSTGRES_PASSWORD
    Value From:
      Secret Key Ref:
        Key:   POSTGRES_PASSWORD
        Name:  midgard-postgres-secrets
    Name:      POSTGRES_SUPER_USER_PASSWORD
    Value From:
      Secret Key Ref:
        Key:   POSTGRES_SUPER_USER_PASSWORD
        Name:  midgard-postgres-secrets
    Name:      POSTGRES_REPLICATION_PASSWORD
    Value From:
      Secret Key Ref:
        Key:   POSTGRES_REPLICATION_USER_PASSWORD
        Name:  midgard-postgres-secrets
  Failover:
  Image:  postgres:16.2@sha256:f58300ac8d393b2e3b09d36ea12d7d24ee9440440e421472a300e929ddb63460
  Port:   5432
  Probe:
  Replicas:  3
  Resources:
    Limits:
      Cpu:     1
      Memory:  4Gi
    Requests:
      Cpu:     1
      Memory:  2Gi
  Scheduler:
    Affinity:
      Pod Anti Affinity:
        Preferred During Scheduling Ignored During Execution:
          Pod Affinity Term:
            Label Selector:
              Match Expressions:
                Key:       app
                Operator:  In
                Values:
                  midgard-postgres
            Topology Key:  kubernetes.io/hostname
          Weight:          100
  Security Context:
    Fs Group:         1001
    Run As Group:     1001
    Run As Non Root:  true
    Run As User:      1001
  Volume:
    Volume Mounts:
      Mount Path:  /var/run/postgresql
      Name:        postgres-run
    Volumes:
      Empty Dir:
      Name:  postgres-run
Status:
  Blocking Operation:
    Stateful Set Operation:
    Stateful Set Spec Update Operation:
  Enforced Replicas:            3
  Last Created Instance Index:  3
  Previous Blocking Operation:
    Operation Id:  Replica DB count spec enforcement
    Stateful Set Operation:
      Instance Index:  3
      Name:            midgard-postgres-3
    Stateful Set Spec Update Operation:
    Step Id:                   Replica DB is deploying
    Time Out Epoc In Seconds:  1711054120
Events:
  Type    Reason                        Age                From                 Message
  ----    ------                        ----               ----                 -------
  Normal  DefaultSpecValue              2m9s               Kubegres-controller  A default value was set for a field in Kubegres YAML spec. 'spec.customConfig': New value: base-kubegres-config
  Normal  DefaultSpecValue              2m9s               Kubegres-controller  A default value was set for a field in Kubegres YAML spec. 'spec.Affinity': New value: &Affinity{NodeAffinity:nil,PodAffinity:nil,PodAntiAffinity:&PodAntiAffinity{RequiredDuringSchedulingIgnoredDuringExecution:[]PodAffinityTerm{},PreferredDuringSchedulingIgnoredDuringExecution:[]WeightedPodAffinityTerm{WeightedPodAffinityTerm{Weight:100,PodAffinityTerm:PodAffinityTerm{LabelSelector:&v1.LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:app,Operator:In,Values:[midgard-postgres],},},},Namespaces:[],TopologyKey:kubernetes.io/hostname,NamespaceSelector:nil,},},},},}
  Normal  PrimaryStatefulSetDeployment  2m9s               Kubegres-controller  Deployed Primary StatefulSet. 'Primary name': midgard-postgres-1
  Normal  BlockingOperationCompleted    104s               Kubegres-controller  Blocking-Operation is successfully completed. 'OperationId': Primary DB count spec enforcement, 'StepId': Primary DB is deploying
  Normal  ReplicaStatefulSetDeployment  104s               Kubegres-controller  Deployed Replica StatefulSet. 'Replica name': midgard-postgres-2
  Normal  ServiceDeployment             104s               Kubegres-controller  Deployed Primary Service. 'Service name': midgard-postgres
  Normal  ReplicaStatefulSetDeployment  82s                Kubegres-controller  Deployed Replica StatefulSet. 'Replica name': midgard-postgres-3
  Normal  ServiceDeployment             82s                Kubegres-controller  Deployed Replica Service. 'Service name': midgard-postgres-replica
  Normal  BlockingOperationCompleted    58s (x2 over 82s)  Kubegres-controller  Blocking-Operation is successfully completed. 'OperationId': Replica DB count spec enforcement, 'StepId': Replica DB is deploying
(⎈|midgard-aks-dev-n9zz4-7qzmn-admin:default)➜  ~ kubectl get pods --namespace heimdall
NAME                                    READY   STATUS    RESTARTS   AGE
midgard-deployment-6587964858-wdpqj     2/2     Running   0          4m14s
midgard-deployment-6587964858-wp2vt     2/2     Running   0          2m19s
midgard-otel-postgres-67cb5ffd8-46zxf   1/1     Running   0          11h
midgard-postgres-1-0                    1/1     Running   0          4m11s
midgard-postgres-2-0                    1/1     Running   0          3m46s
midgard-postgres-3-0                    1/1     Running   0          3m24s

Here Midgard is our Backstage instance running with Kubegres. I'll wrap up my internal PR on bumping to v1.18 and then proceed to implement the same on our Grafana stack for our Observability Platform.

The midgard-otel-postgres.. pod is why we want to add the sidepod capabilities btw.

Just wanted to share that it all looks good and the mappings go through. Thanks a bunch!

alex-arica commented 7 months ago

Amazing, nice work.