hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 385 forks source link

Fail scheduling all pods that are not part of consul when the webhook is offline #1024

Closed kschoche closed 3 years ago

kschoche commented 3 years ago

Fail scheduling all pods that are not labeled with app: consul.name. This ensures that no user apps are inadvertently scheduled and skip mutation while the webhook is offline (after consul has been installed). I chose to match on app: consul.name so we do not fail to schedule our own pods in case the webhook object is applied to k8s before the rest of our consul components are scheduled.

Changes proposed in this PR:

How I've tested this PR:

  1. Manually tested by applying this patch which sets the webhook to unready, deploy consul.
diff --git a/templates/connect-inject-deployment.yaml b/templates/connect-inject-deployment.yaml
index 9c35728..e0745d9 100644
--- a/templates/connect-inject-deployment.yaml
+++ b/templates/connect-inject-deployment.yaml
@@ -39,6 +39,13 @@ spec:
       serviceAccountName: {{ template "consul.fullname" . }}-connect-injector-webhook-svc-account
       containers:
         - name: sidecar-injector
+          readinessProbe:
+            exec:
+              command:
+                - cat
+                - /tmp/healthy
+            initialDelaySeconds: 30
+            periodSeconds: 5
           image: "{{ default .Values.global.imageK8S .Values.connectInject.image }}"
           ports:
           - containerPort: 8080
  1. Wait for everything to come "online" and the readinessProbe to fail so that both copies of the webhook are unready:
    % k get pods
    NAME                                                              READY   STATUS    RESTARTS   AGE
    consul-consul-4v7ml                                               1/1     Running   0          66s
    consul-consul-5sgcd                                               1/1     Running   0          66s
    consul-consul-connect-injector-webhook-deployment-84cb5c9758crk   0/1     Running   0          65s
    consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9   0/1     Running   0          65s
    consul-consul-controller-7dbd5c45d4-6wbt7                         1/1     Running   0          65s
    consul-consul-p4wq4                                               1/1     Running   0          65s
    consul-consul-server-0                                            1/1     Running   0          65s
    consul-consul-webhook-cert-manager-d4598f84-t8qg9                 1/1     Running   0          65s
    consul-consul-x9k46                                               1/1     Running   0          66s
  2. Deploy a sample app that is connect injected and see that it does not get scheduled.
  3. Set a copy of the webhook to healthy: kubectl exec -it consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9 -- touch /tmp/healthy
  4. Sample app gets scheduled:
    % k get pods
    NAME                                                              READY   STATUS    RESTARTS   AGE
    consul-consul-4v7ml                                               1/1     Running   0          7m13s
    consul-consul-5sgcd                                               1/1     Running   0          7m13s
    consul-consul-connect-injector-webhook-deployment-84cb5c9758crk   0/1     Running   0          7m12s
    consul-consul-connect-injector-webhook-deployment-84cb5c97llxd9   1/1     Running   0          7m12s
    consul-consul-controller-7dbd5c45d4-6wbt7                         1/1     Running   0          7m12s
    consul-consul-p4wq4                                               1/1     Running   0          7m12s
    consul-consul-server-0                                            1/1     Running   0          7m12s
    consul-consul-webhook-cert-manager-d4598f84-t8qg9                 1/1     Running   0          7m12s
    consul-consul-x9k46                                               1/1     Running   0          7m13s
    whoami-75f5b5f654-4vtcs                                           2/2     Running   0          5m28s

    How I expect reviewers to test this PR: Code review CI run against GKE: https://app.circleci.com/pipelines/github/hashicorp/consul-helm/3370/workflows/2ae29e9c-a234-407a-a4c9-84a96fad0979 CI run against Kind: https://app.circleci.com/pipelines/github/hashicorp/consul-helm/3369/workflows/5a0f3727-e04a-4b03-902a-0bc0137e45f4

Checklist: