hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
666 stars 317 forks source link

Readiness Probe fails when TLS is enabled #944

Closed manedurphy closed 2 years ago

manedurphy commented 2 years ago

Community Note


Overview of the Issue

Reproduction Steps

Config ```yaml global: name: consul datacenter: homelab image: "hashicorp/consul:1.11.1" imageEnvoy: "envoyproxy/envoy:v1.19.1" acls: manageSystemACLs: true gossipEncryption: autoGenerate: true tls: enabled: true enableAutoEncrypt: true verify: true server: replicas: 3 securityContext: runAsNonRoot: false runAsUser: 0 tolerations: | - key: "CriticalAddonsOnly" operator: "Equal" value: "true" effect: "NoExecute" ui: enabled: true service: type: "NodePort" connectInject: enabled: true default: false replicas: 1 logLevel: "info" k8sAllowNamespaces: ["*"] k8sDenyNamespaces: [] metrics: defaultEnabled: true transparentProxy: defaultEnabled: true defaultOverwriteProbes: true syncCatalog: enabled: false client: enabled: true grpc: true controller: enabled: true ```
# Command
kubectl get pods -n consul

# Output
NAME                                                          READY   STATUS    RESTARTS   AGE     IP           NODE            NOMINATED NODE   READINESS GATES
consul-webhook-cert-manager-5b898f94df-ppqtt                  1/1     Running   0          5m22s   10.42.2.33   k3s-worker-01   <none>           <none>
consul-server-0                                               1/1     Running   0          5m21s   10.42.1.64   k3s-server-02   <none>           <none>
consul-server-1                                               1/1     Running   0          5m21s   10.42.0.67   k3s-server-01   <none>           <none>
consul-server-2                                               1/1     Running   0          5m21s   10.42.2.36   k3s-worker-01   <none>           <none>
consul-blcsd                                                  0/1     Running   0          5m22s   10.42.2.32   k3s-worker-01   <none>           <none>
consul-controller-65b6f5f4d6-j865v                            1/1     Running   0          5m22s   10.42.2.37   k3s-worker-01   <none>           <none>
consul-connect-injector-webhook-deployment-6c4c646776-cr8zt   1/1     Running   0          5m22s   10.42.2.38   k3s-worker-01   <none>           <none>

Logs

# Command
kubectl describe pod -n consul consul-blcsd

# Readiness Probe
Readiness:  exec [/bin/sh -ec curl \
  -k \
  https://127.0.0.1:8501/v1/status/leader \
2>/dev/null | grep -E '".+"'
] delay=0s timeout=1s period=10s #success=1 #failure=3

# Warning
Warning  Unhealthy    6s (x7 over 49s)  kubelet            Readiness probe failed:

Expected behavior

# Command
kubectl exec -n consul consul-blcsd -c consul -- curl -k https://127.0.0.1:8501/v1/status/leader

# Output
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    17  100    17    0     0     12      0  0:00:01  0:00:01 --:--:--    12
"10.42.2.36:8300"%

Environment details

Additional Context

ishustava commented 2 years ago

Hey @manedurphy

Could you provide logs of the consul client that fails the readiness probe?

david-yu commented 2 years ago

Hi @manedurphy we have not seen a response for a while, as we were waiting for follow up information. I'll go ahead and close this issue but if you have more details on why the Client was failing the readiness probe that would be helpful.

jstaf commented 2 years ago

This is pretty reproducible, just install the consul helm chart on a local Kubernetes instance (like from Docker Desktop) and it always fails. I would consider this fixed if there was a way to disable the readinessProbe in the helm chart (currently you can't disable it except by editing out the readinessProbe from the consul-client daemonset).

ishustava commented 2 years ago

Hey @jstaf

We have end-to-end tests with TLS enabled and they are currently not failing, so I'm not sure how to reproduce this. Currently, we test on kind, gke, aks, and eks. I've just tried with Docker Desktop, helm chart version 0.41.1 and the config that the original author has provided, and couldn't reproduce this.

If you can provide exact steps to reproduce, that could help.

jmurret commented 2 years ago

Hi @jstaf, we have not heard a response in a couple of months on this. I'm going to close it for now. If you have any further details that we can use to recreate this, please reply and we will re-open the issue.