open-policy-agent / gatekeeper

🐊 Gatekeeper - Policy Controller for Kubernetes
https://open-policy-agent.github.io/gatekeeper/
Apache License 2.0
3.72k stars 764 forks source link

with system-cluster-critical priorityClass is not permitted in gatekeeper-system #1421

Closed developer-guy closed 3 years ago

developer-guy commented 3 years ago
$ helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
$ helm repo update
$ helm upgrade --install gatekeeper gatekeeper/gatekeeper \
      --namespace gatekeeper-system \
      --set experimentalEnableMutation=true

When I ran all the commands above, I saw the following error within the status of the deployment:

$ kubectl get deployments -n gatekeeper-system gatekeeper-controller-manager -oyaml
YAML Definition

```yml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" meta.helm.sh/release-name: gatekeeper meta.helm.sh/release-namespace: gatekeeper-system creationTimestamp: "2021-07-06T20:08:00Z" generation: 1 labels: app: gatekeeper app.kubernetes.io/managed-by: Helm chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper name: gatekeeper-controller-manager namespace: gatekeeper-system resourceVersion: "324005" selfLink: /apis/apps/v1/namespaces/gatekeeper-system/deployments/gatekeeper-controller-manager uid: 564e905f-098b-4895-a0ce-ca4770e2e247 spec: progressDeadlineSeconds: 600 replicas: 3 revisionHistoryLimit: 10 selector: matchLabels: app: gatekeeper chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: container.seccomp.security.alpha.kubernetes.io/manager: runtime/default creationTimestamp: null labels: app: gatekeeper chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: gatekeeper.sh/operation operator: In values: - webhook topologyKey: kubernetes.io/hostname weight: 100 automountServiceAccountToken: true containers: - args: - --port=8443 - --logtostderr - --log-denies=false - --emit-admission-events=false - --log-level=INFO - --exempt-namespace=gatekeeper-system - --operation=webhook - --enable-mutation=true command: - /manager env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name image: openpolicyagent/gatekeeper:v3.5.1 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 9090 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: manager ports: - containerPort: 8443 name: webhook-server protocol: TCP - containerPort: 8888 name: metrics protocol: TCP - containerPort: 9090 name: healthz protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: 9090 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: "1" memory: 512Mi requests: cpu: 100m memory: 256Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - all readOnlyRootFilesystem: true runAsGroup: 999 runAsNonRoot: true runAsUser: 1000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /certs name: cert readOnly: true dnsPolicy: ClusterFirst nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: gatekeeper-admin serviceAccountName: gatekeeper-admin terminationGracePeriodSeconds: 60 volumes: - name: cert secret: defaultMode: 420 secretName: gatekeeper-webhook-server-cert status: conditions: - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: Created new replica set "gatekeeper-controller-manager-8479974865" reason: NewReplicaSetCreated status: "True" type: Progressing - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: 'pods "gatekeeper-controller-manager-8479974865-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in gatekeeper-system namespace' reason: FailedCreate status: "True" type: ReplicaFailure observedGeneration: 1 unavailableReplicas: 3 ```

And I also saw that gatekeeper-update-namespace-label Pod went into the CrashLoopBackOff state because ValidatingAdmissionWebhook won't let it update the namespace.

What did you expect to happen:

I expected to happen is that all the pods of the Gatekeeper should be up and running.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

We have three priority classes in the cluster.

$ kubectl get priorityclasses.scheduling.k8s.io
NAME                      VALUE        GLOBAL-DEFAULT   AGE
k8s-cluster-critical      1000000000   false            32h
system-cluster-critical   2000000000   false            32h
system-node-critical      2000001000   false            32h

Environment:

developer-guy commented 3 years ago

We fix that problem by removing the priorityClassName field from both audit and controller-manager deployments of the Gatekeeper and we also removed the ResourceQuota manifest from the templates of the chart.

cc: @Dentrax

sozercan commented 3 years ago

Looks like you are on K8s v1.16, priority class used to be restricted to kube-system before v1.17 I believe. It makes sense you are seeing that error. https://github.com/kubernetes/kubernetes/pull/76310

developer-guy commented 3 years ago

Looks like you are on K8s v1.16, priority class used to be restricted to kube-system before v1.17 I believe. It makes sense you are seeing that error. kubernetes/kubernetes#76310

yeah, that's right @sozercan, the solution that we are proposing to fix this problem is making the ResourceQuta template and priorityClassNames for both audit and controller-manager deployments of the Gatekeeper conditional. I and @Dentrax are the volunteers for doing this issue.

WDYT @sozercan?