rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
881 stars 272 forks source link

allowPrivilegeEscalation & capabilities not getting propagated as Overrides #1733

Closed NahuelVarela closed 1 month ago

NahuelVarela commented 1 month ago

Describe the bug

I'm using the Rabbit MQ Operator https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml - 2.10.0

admission webhook "validation.gatekeeper.sh" denied the request: [cis-k8s-v1.5.1-psp-allow-privilege-escalation] Privilege escalation container is not allowed: operator [cis-k8s-v1.5.1-psp-capabilities] container is not dropping all required capabilities. Container must drop all of ["NET_RAW"] or "ALL"

My cluster requires pods to have

allowPrivilegeEscalation: false
capabilities:
  drop:
    - ALL

My Deployment yaml is:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: rabbitmq-operator
    app.kubernetes.io/name: rabbitmq-cluster-operator
    app.kubernetes.io/part-of: rabbitmq
  name: rabbitmq-cluster-operator
  namespace: rabbitmq-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: rabbitmq-cluster-operator
  template:
    metadata:
      labels:
        app.kubernetes.io/component: rabbitmq-operator
        app.kubernetes.io/name: rabbitmq-cluster-operator
        app.kubernetes.io/part-of: rabbitmq
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        runAsGroup: 999
        fsGroup: 999
        seccompProfile:
          type: RuntimeDefault
      containers:
      - command:
        - /manager
        env:
        - name: OPERATOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        image: rabbitmqoperator/cluster-operator:2.10.0
        name: operator
        ports:
        - containerPort: 9782
          name: metrics
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 500Mi
          requests:
            cpu: 200m
            memory: 500Mi
      securityContext:
        privileged: false
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        runAsUser: 999
        runAsUser: 999
        runAsGroup: 999
        fsGroup: 999
        capabilities:
          drop:
            - ALL
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: rabbitmq-cluster-operator
      terminationGracePeriodSeconds: 10

My override is:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: rabbit
  namespace: [redacted]
spec:
  replicas: 1
  override:
    statefulSet:
      spec:
        template:
          spec:
            containers:
              - name: rabbitmq
                securityContext:
                  allowPrivilegeEscalation: false
                  capabilities:
                    drop:
                      - ALL
                  privileged: false
                  runAsNonRoot: true
                  runAsUser: 999
                  runAsGroup: 999
                  fsGroup: 999

And the Deployment yaml that get's created by the operator is:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  annotations:
    deployment.kubernetes.io/desired-replicas: '1'
    deployment.kubernetes.io/max-replicas: '2'
    deployment.kubernetes.io/revision: '1'
  creationTimestamp: '2024-09-24T10:38:09Z'
  generation: 1
  labels:
    app.kubernetes.io/component: rabbitmq-operator
    app.kubernetes.io/name: rabbitmq-cluster-operator
    app.kubernetes.io/part-of: rabbitmq
    pod-template-hash: 58c56c988b
  name: rabbitmq-cluster-operator-58c56c988b
  namespace: rabbitmq-system
  ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: Deployment
      name: rabbitmq-cluster-operator
      uid: 94b735f3-654a-43c0-9837-bf2bc8a027da
  resourceVersion: '204112193'
  uid: 8420e6f6-4429-4e3a-9bd3-8d264c6633ef
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: rabbitmq-cluster-operator
      pod-template-hash: 58c56c988b
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: rabbitmq-operator
        app.kubernetes.io/name: rabbitmq-cluster-operator
        app.kubernetes.io/part-of: rabbitmq
        pod-template-hash: 58c56c988b
    spec:
      containers:
        - command:
            - /manager
          env:
            - name: OPERATOR_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
          image: 'rabbitmqoperator/cluster-operator:2.10.0'
          imagePullPolicy: IfNotPresent
          name: operator
          ports:
            - containerPort: 9782
              name: metrics
              protocol: TCP
          resources:
            limits:
              cpu: 200m
              memory: 500Mi
            requests:
              cpu: 200m
              memory: 500Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 999
        runAsGroup: 999
        runAsNonRoot: true
        runAsUser: 999
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: rabbitmq-cluster-operator
      serviceAccountName: rabbitmq-cluster-operator
      terminationGracePeriodSeconds: 10
status:
  conditions:
    - lastTransitionTime: '2024-09-24T10:38:09Z'
      message: >-
        admission webhook "validation.gatekeeper.sh" denied the request:
        [cis-k8s-v1.5.1-psp-allow-privilege-escalation] Privilege escalation
        container is not allowed: operator

        [cis-k8s-v1.5.1-psp-capabilities] container <operator> is not dropping
        all required capabilities. Container must drop all of ["NET_RAW"] or
        "ALL"
      reason: FailedCreate
      status: 'True'
      type: ReplicaFailure
  observedGeneration: 1
  replicas: 0

Any ideas what is happening?

NahuelVarela commented 1 month ago

Turns out it was an indentation error. In My Deployment definition, securityContext: was outside the Container. That's why the errors appeared.