Closed rrondeau closed 3 years ago
Hi @rrondeau would it be possible to a kubectl describe pod
on the the two failing ingress gateway pods? Wondering if there are any hints there.
consul-back-ingress-gateway-675659dd45-chmzf 1/2 CrashLoopBackOff 13 30m
consul-front-ingress-gateway-79995bb7b4-87kmv 1/2 CrashLoopBackOff 9 15m
Sorry for the delay :/
Here is a describe of one pod failing without my workaround :
Namespace: consul
Priority: 0
Node: gke-sue-gke-cluster0-sue-gke-cluster0-7e4ade2d-f4bz/10.100.100.61
Start Time: Wed, 02 Jun 2021 10:19:03 +0000
Labels: app=consul
app.kubernetes.io/managed-by=spinnaker
app.kubernetes.io/name=consul
chart=consul-helm
component=ingress-gateway
heritage=Helm
ingress-gateway-name=consul-back-ingress-gateway
pod-template-hash=89c4f8559
release=consul
Annotations: artifact.spinnaker.io/location: consul
artifact.spinnaker.io/name: consul-back-ingress-gateway
artifact.spinnaker.io/type: kubernetes/deployment
artifact.spinnaker.io/version:
cni.projectcalico.org/podIP: 10.110.3.40/32
consul.hashicorp.com/connect-inject: false
moniker.spinnaker.io/application: consul
moniker.spinnaker.io/cluster: deployment consul-back-ingress-gateway
Status: Running
IP: 10.110.3.40
IPs:
IP: 10.110.3.40
Controlled By: ReplicaSet/consul-back-ingress-gateway-89c4f8559
Init Containers:
copy-consul-bin:
Container ID: docker://091a9181705914a170a656aac0c5eedb9b6e65287f2bbe7a3771b3b9d68fb015
Image: docker.io/hashicorp/consul:1.9.5
Image ID: docker-pullable://hashicorp/consul@sha256:35f1bdb6c516a4fae6e4b056b0d4e9ddd0a3874efc43fc0dc8db49ef2b5d4442
Port: <none>
Host Port: <none>
Command:
cp
/bin/consul
/consul-bin/consul
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 02 Jun 2021 10:19:08 +0000
Finished: Wed, 02 Jun 2021 10:19:11 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 150Mi
Requests:
cpu: 50m
memory: 25Mi
Environment: <none>
Mounts:
/consul-bin from consul-bin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from consul-back-ingress-gateway-token-dkwqt (ro)
service-init:
Container ID: docker://f2b3ab36229231575cb41be2f37ed593c4db218bf56758d132017ffb3b233243
Image: hashicorp/consul-k8s:0.25.0
Image ID: docker-pullable://hashicorp/consul-k8s@sha256:66a1dfd964e9a8fe2477803462fd08cb83744a65f2b8083e1c51c580f6930c7d
Port: <none>
Host Port: <none>
Command:
/bin/sh
-ec
consul-k8s service-address \
-k8s-namespace=consul \
-name=consul-back-ingress-gateway \
-output-file=/tmp/address.txt
WAN_ADDR="$(cat /tmp/address.txt)"
WAN_PORT=8080
cat > /consul/service/service.hcl << EOF
service {
kind = "ingress-gateway"
name = "back-ingress-gateway"
id = "${POD_NAME}"
port = ${WAN_PORT}
address = "${WAN_ADDR}"
tagged_addresses {
lan {
address = "${POD_IP}"
port = 21000
}
wan {
address = "${WAN_ADDR}"
port = ${WAN_PORT}
}
}
proxy {
config {
envoy_gateway_no_default_bind = true
envoy_gateway_bind_addresses {
all-interfaces {
address = "0.0.0.0"
}
}
}
}
checks = [
{
name = "Ingress Gateway Listening"
interval = "10s"
tcp = "${POD_IP}:21000"
deregister_critical_service_after = "6h"
}
]
}
EOF
/consul-bin/consul services register \
/consul/service/service.hcl
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 02 Jun 2021 10:19:13 +0000
Finished: Wed, 02 Jun 2021 10:19:18 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 50Mi
Requests:
cpu: 50m
memory: 50Mi
Environment:
HOST_IP: (v1:status.hostIP)
POD_IP: (v1:status.podIP)
POD_NAME: consul-back-ingress-gateway-89c4f8559-9cxsq (v1:metadata.name)
CONSUL_HTTP_ADDR: http://$(HOST_IP):8500
Mounts:
/consul-bin from consul-bin (rw)
/consul/service from consul-service (rw)
/var/run/secrets/kubernetes.io/serviceaccount from consul-back-ingress-gateway-token-dkwqt (ro)
Containers:
ingress-gateway:
Container ID: docker://d856c727faa63bd588e91f161207d60f5a524eeba3df20b66bf0a24d6c373289
Image: envoyproxy/envoy-alpine:v1.16.3
Image ID: docker-pullable://envoyproxy/envoy-alpine@sha256:a11d7329678617c1b29ee28392c76c6ac00ecc55266b0866b7c99f6a7717f9a6
Ports: 21000/TCP, 8080/TCP, 8443/TCP, 9102/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/consul-bin/consul
connect
envoy
-gateway=ingress
-proxy-id=$(POD_NAME)
-address=$(POD_IP):21000
State: Running
Started: Wed, 02 Jun 2021 10:24:48 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 02 Jun 2021 10:23:57 +0000
Finished: Wed, 02 Jun 2021 10:23:58 +0000
Ready: False
Restart Count: 5
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: tcp-socket :21000 delay=30s timeout=5s period=10s #success=1 #failure=3
Readiness: tcp-socket :21000 delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
HOST_IP: (v1:status.hostIP)
POD_IP: (v1:status.podIP)
POD_NAME: consul-back-ingress-gateway-89c4f8559-9cxsq (v1:metadata.name)
CONSUL_HTTP_ADDR: http://$(HOST_IP):8500
CONSUL_GRPC_ADDR: $(HOST_IP):8502
Mounts:
/consul-bin from consul-bin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from consul-back-ingress-gateway-token-dkwqt (ro)
consul-sidecar:
Container ID: docker://617a2892d93fc1acdd963d4e01c3d8e58c145de92c773fc6d2f53051881caf6b
Image: hashicorp/consul-k8s:0.25.0
Image ID: docker-pullable://hashicorp/consul-k8s@sha256:66a1dfd964e9a8fe2477803462fd08cb83744a65f2b8083e1c51c580f6930c7d
Port: <none>
Host Port: <none>
Command:
consul-k8s
consul-sidecar
-service-config=/consul/service/service.hcl
-consul-binary=/consul-bin/consul
State: Running
Started: Wed, 02 Jun 2021 10:19:22 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 20m
memory: 50Mi
Requests:
cpu: 20m
memory: 25Mi
Environment:
HOST_IP: (v1:status.hostIP)
POD_IP: (v1:status.podIP)
CONSUL_HTTP_ADDR: http://$(HOST_IP):8500
Mounts:
/consul-bin from consul-bin (rw)
/consul/service from consul-service (ro)
/var/run/secrets/kubernetes.io/serviceaccount from consul-back-ingress-gateway-token-dkwqt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
consul-bin:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
consul-service:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
consul-back-ingress-gateway-token-dkwqt:
Type: Secret (a volume populated by a Secret)
SecretName: consul-back-ingress-gateway-token-dkwqt
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m27s default-scheduler Successfully assigned consul/consul-back-ingress-gateway-89c4f8559-9cxsq to gke-sue-gke-cluster0-sue-gke-cluster0-7e4ade2d-f4bz
Normal Pulled 6m23s kubelet Container image "docker.io/hashicorp/consul:1.9.5" already present on machine
Normal Created 6m23s kubelet Created container copy-consul-bin
Normal Started 6m22s kubelet Started container copy-consul-bin
Normal Pulling 6m19s kubelet Pulling image "hashicorp/consul-k8s:0.25.0"
Normal Pulled 6m17s kubelet Successfully pulled image "hashicorp/consul-k8s:0.25.0" in 1.871122892s
Normal Created 6m17s kubelet Created container service-init
Normal Started 6m17s kubelet Started container service-init
Normal Pulling 6m11s kubelet Pulling image "envoyproxy/envoy-alpine:v1.16.3"
Normal Pulled 6m10s kubelet Successfully pulled image "envoyproxy/envoy-alpine:v1.16.3" in 1.708508616s
Normal Created 6m9s kubelet Created container consul-sidecar
Normal Started 6m9s kubelet Started container ingress-gateway
Normal Pulled 6m9s kubelet Container image "hashicorp/consul-k8s:0.25.0" already present on machine
Normal Created 6m9s kubelet Created container ingress-gateway
Normal Started 6m8s kubelet Started container consul-sidecar
Warning Unhealthy 5m14s (x3 over 5m34s) kubelet Liveness probe failed: dial tcp 10.110.3.40:21000: connect: connection refused
Normal Killing 5m14s kubelet Container ingress-gateway failed liveness probe, will be restarted
Warning Unhealthy 5m9s (x6 over 5m59s) kubelet Readiness probe failed: dial tcp 10.110.3.40:21000: connect: connection refused
Normal Pulled 5m3s kubelet Container image "envoyproxy/envoy-alpine:v1.16.3" already present on machine
Warning BackOff 70s (x4 over 91s) kubelet Back-off restarting failed container```
My issue is becoming strange. I left the broken deployment and its up and running this morning. My kubernetes node pool is destroyed and recreated every night/morning. Tried a rollout restart, failing again, no logs.
Just saw this https://github.com/hashicorp/consul/pull/10324 Waiting for the release to test if it solve my issue
Thanks! The 1.9.6 release just landed: https://github.com/hashicorp/consul/releases/tag/v1.9.6, I wonder if this will solve the issue as well. I am a little at loss at what might be the issue here.
Fixed by https://github.com/hashicorp/consul/pull/10324
Thanks !
Overview of the Issue
Consul ingress gateways not starting after helm upgrade with values
Reproduction Steps
Steps to reproduce this issue, eg:
values.yml
:.... ....
connectInject: enabled: true healthChecks: enabled: false k8sAllowNamespaces: [........] centralConfig: enabled: true resources: requests: memory: "100Mi" cpu: "50m" limits: memory: "150Mi" cpu: "50m"
ingressGateways: enabled: true gateways:
consul-back-ingress-gateway-675659dd45-chmzf 1/2 CrashLoopBackOff 13 30m consul-front-ingress-gateway-79995bb7b4-87kmv 1/2 CrashLoopBackOff 9 15m
No logs from the container :/
Environment details
If not already included, please provide the following:
consul-k8s
version: 0.25.0consul-helm
version: 0.31.1envoy
version : 1.16.3Additionally, please provide details regarding the Kubernetes Infrastructure, if known:
Additional Context
After some heavy debugging, the start command seems to be stuck when consul create a pipe to forward the boostrap config to envoy with
consul connect envoy ...
To fix the issue i found a workaround, i replace the pod command from :
to :
If you have any hints, i will be happy to test anything