with system-cluster-critical priorityClass is not permitted in gatekeeper-system

developer-guy commented 3 years ago

$ helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
$ helm repo update
$ helm upgrade --install gatekeeper gatekeeper/gatekeeper \
      --namespace gatekeeper-system \
      --set experimentalEnableMutation=true

When I ran all the commands above, I saw the following error within the status of the deployment:

$ kubectl get deployments -n gatekeeper-system gatekeeper-controller-manager -oyaml

YAML Definition

```yml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" meta.helm.sh/release-name: gatekeeper meta.helm.sh/release-namespace: gatekeeper-system creationTimestamp: "2021-07-06T20:08:00Z" generation: 1 labels: app: gatekeeper app.kubernetes.io/managed-by: Helm chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper name: gatekeeper-controller-manager namespace: gatekeeper-system resourceVersion: "324005" selfLink: /apis/apps/v1/namespaces/gatekeeper-system/deployments/gatekeeper-controller-manager uid: 564e905f-098b-4895-a0ce-ca4770e2e247 spec: progressDeadlineSeconds: 600 replicas: 3 revisionHistoryLimit: 10 selector: matchLabels: app: gatekeeper chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: container.seccomp.security.alpha.kubernetes.io/manager: runtime/default creationTimestamp: null labels: app: gatekeeper chart: gatekeeper control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" heritage: Helm release: gatekeeper spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: gatekeeper.sh/operation operator: In values: - webhook topologyKey: kubernetes.io/hostname weight: 100 automountServiceAccountToken: true containers: - args: - --port=8443 - --logtostderr - --log-denies=false - --emit-admission-events=false - --log-level=INFO - --exempt-namespace=gatekeeper-system - --operation=webhook - --enable-mutation=true command: - /manager env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name image: openpolicyagent/gatekeeper:v3.5.1 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 9090 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 name: manager ports: - containerPort: 8443 name: webhook-server protocol: TCP - containerPort: 8888 name: metrics protocol: TCP - containerPort: 9090 name: healthz protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: 9090 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: "1" memory: 512Mi requests: cpu: 100m memory: 256Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - all readOnlyRootFilesystem: true runAsGroup: 999 runAsNonRoot: true runAsUser: 1000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /certs name: cert readOnly: true dnsPolicy: ClusterFirst nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: gatekeeper-admin serviceAccountName: gatekeeper-admin terminationGracePeriodSeconds: 60 volumes: - name: cert secret: defaultMode: 420 secretName: gatekeeper-webhook-server-cert status: conditions: - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: Created new replica set "gatekeeper-controller-manager-8479974865" reason: NewReplicaSetCreated status: "True" type: Progressing - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2021-07-06T20:08:00Z" lastUpdateTime: "2021-07-06T20:08:00Z" message: 'pods "gatekeeper-controller-manager-8479974865-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in gatekeeper-system namespace' reason: FailedCreate status: "True" type: ReplicaFailure observedGeneration: 1 unavailableReplicas: 3 ```

And I also saw that gatekeeper-update-namespace-label Pod went into the CrashLoopBackOff state because ValidatingAdmissionWebhook won't let it update the namespace.

What did you expect to happen:

I expected to happen is that all the pods of the Gatekeeper should be up and running.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

We have three priority classes in the cluster.

$ kubectl get priorityclasses.scheduling.k8s.io
NAME                      VALUE        GLOBAL-DEFAULT   AGE
k8s-cluster-critical      1000000000   false            32h
system-cluster-critical   2000000000   false            32h
system-node-critical      2000001000   false            32h

Environment:

Master/Worker Nodes:

Linux version 3.10.0-1127.19.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Aug 25 17:23:54 UTC 2020

Gatekeeper version: v3.5.1

Kubernetes version: (use kubectl version):

$ kubectl version skew -oyaml
clientVersion:
buildDate: "2021-06-16T12:59:11Z"
compiler: gc
gitCommit: 092fbfbf53427de67cac1e9fa54aaa09a28371d7
gitTreeState: clean
gitVersion: v1.21.2
goVersion: go1.16.5
major: "1"
minor: "21"
platform: darwin/amd64
serverVersion:
buildDate: "2020-06-17T11:41:28Z"
compiler: gc
gitCommit: 436254b798f772bcb8e67dcfe122e46500eeb254
gitTreeState: clean
gitVersion: v1.16.11
goVersion: go1.13.9
major: "1"
minor: "16"
platform: linux/amd64

developer-guy commented 3 years ago

We fix that problem by removing the priorityClassName field from both audit and controller-manager deployments of the Gatekeeper and we also removed the ResourceQuota manifest from the templates of the chart.

cc: @Dentrax

sozercan commented 3 years ago

Looks like you are on K8s v1.16, priority class used to be restricted to kube-system before v1.17 I believe. It makes sense you are seeing that error. https://github.com/kubernetes/kubernetes/pull/76310

developer-guy commented 3 years ago

Looks like you are on K8s v1.16, priority class used to be restricted to kube-system before v1.17 I believe. It makes sense you are seeing that error. kubernetes/kubernetes#76310

yeah, that's right @sozercan, the solution that we are proposing to fix this problem is making the ResourceQuta template and priorityClassNames for both audit and controller-manager deployments of the Gatekeeper conditional. I and @Dentrax are the volunteers for doing this issue.

WDYT @sozercan?

open-policy-agent / gatekeeper

with system-cluster-critical priorityClass is not permitted in gatekeeper-system #1421