hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 385 forks source link

Fail to start Ingress Gateway container with error: standard_init_linux.go:219: exec user process caused #796

Closed kw7oe closed 3 years ago

kw7oe commented 3 years ago

Overview of the Issue

I was trying to install a Consul cluster with Ingress Gateway on a Rasberry Pi 4 running k3s as follows:

╰─➤  kubectl get nodes -A -o wide
NAME          STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
raspberrypi   Ready    control-plane,master   32m   v1.20.2+k3s1   192.168.0.171   <none>        Raspbian GNU/Linux 10 (buster)   5.4.83-v7l+      containerd://1.4.3-k3s1

But fail to start the ingress-gateway container for the ingress-gateway pods with the following logs:

standard_init_linux.go:219: exec user process caused: no such file or directory

After also trying to install Grafana and Prometheus with envoy sidecar to be injected, I have found that the root issue is at starting up the Envoy sidecar container.

My suspect is that the envoy image use is not build for ARM architecture? If that's the case, would this be solve by changing the helm imageEnvoy configuration (will try this out once i got the time)?

Reproduction Steps

Steps to reproduce this issue, eg:

  1. When running helm install -f values.yaml consul hashicorp/consul --version "0.27.0" with the following content:
    
    # name your datacenter
    global:
    name: consul
    datacenter: l7

server:

use 1 server

replicas: 1 bootstrapExpect: 1 disruptionBudget: enabled: true maxUnavailable: 0 extraConfig: | { "telemetry": { "prometheus_retention_time": "10s" }, "ui_config": { "enabled": true, "metrics_provider": "prometheus", "metrics_proxy": { "base_url": "http://prometheus-server" } } }

client: enabled: true

enable grpc on your client to support consul consul connect

grpc: true

ui: enabled: true

ingressGateways: defaults: replicas: 1 enabled: true affinitiy: {} gateways:

connectInject: enabled: true

inject an envoy sidecar into every new pod, except for those with annotations that prevent injection

default: true

these settings enable L7 metrics collection and are new in 1.5

centralConfig: enabled: true

set the default protocol (cab be overwritten with annotations)

defaultProtocol: "http"
# proxyDefaults is a raw json string that will be applied to all Connect
# proxy sidecar pods that can include any valid configuration for the
# configured proxy.
proxyDefaults: |
  {
    "envoy_prometheus_bind_addr": "0.0.0.0:9102"
  }

controller: enabled: true


2. View error

**Outputs of `kubectl get pods`:**

NAME READY STATUS RESTARTS AGE consul-webhook-cert-manager-5bc5bb4c86-7gtqc 1/1 Running 0 16m consul-controller-597c88b45d-kxzc6 1/1 Running 0 16m consul-connect-injector-webhook-deployment-644b69667c-g8gq2 1/1 Running 1 16m consul-server-0 1/1 Running 0 16m svclb-consul-ingress-gateway-q9stn 2/2 Running 0 16m consul-8t529 1/1 Running 0 16m consul-ingress-gateway-685756cffc-fvvvm 1/2 CrashLoopBackOff 8 16m


### Logs

**Ingress Gateway Logs from `kubectl logs consul-ingress-gateway-685756cffc-fvvvm ingress-gateway`:**

standard_init_linux.go:219: exec user process caused: no such file or directory


<details>
  <summary>Outputs of kubectl describe deploy/consul-ingress-gateway:</summary>

Name: consul-ingress-gateway Namespace: default CreationTimestamp: Fri, 29 Jan 2021 17:45:39 +0800 Labels: app=consul app.kubernetes.io/managed-by=Helm chart=consul-helm component=ingress-gateway heritage=Helm ingress-gateway-name=consul-ingress-gateway release=consul Annotations: deployment.kubernetes.io/revision: 1 meta.helm.sh/release-name: consul meta.helm.sh/release-namespace: default Selector: app=consul,chart=consul-helm,component=ingress-gateway,heritage=Helm,ingress-gateway-name=consul-ingress-gateway,release=consul Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=consul chart=consul-helm component=ingress-gateway heritage=Helm ingress-gateway-name=consul-ingress-gateway release=consul Annotations: consul.hashicorp.com/connect-inject: false Service Account: consul-ingress-gateway Init Containers: copy-consul-bin: Image: hashicorp/consul:1.9.0 Port: Host Port: Command: cp /bin/consul /consul-bin/consul Limits: cpu: 50m memory: 150Mi Requests: cpu: 50m memory: 25Mi Environment: Mounts: /consul-bin from consul-bin (rw) service-init: Image: hashicorp/consul-k8s:0.21.0 Port: Host Port: Command: /bin/sh -ec consul-k8s service-address \ -k8s-namespace=default \ -name=consul-ingress-gateway \ -resolve-hostnames \ -output-file=/tmp/address.txt WAN_ADDR="$(cat /tmp/address.txt)" WAN_PORT=8080

  cat > /consul/service/service.hcl << EOF
  service {
    kind = "ingress-gateway"
    name = "ingress-gateway"
    id = "${POD_NAME}"
    port = ${WAN_PORT}
    address = "${WAN_ADDR}"
    tagged_addresses {
      lan {
        address = "${POD_IP}"
        port = 21000
      }
      wan {
        address = "${WAN_ADDR}"
        port = ${WAN_PORT}
      }
    }
    proxy {
      config {
        envoy_gateway_no_default_bind = true
        envoy_gateway_bind_addresses {
          all-interfaces {
            address = "0.0.0.0"
          }
        }
      }
    }
    checks = [
      {
        name = "Ingress Gateway Listening"
        interval = "10s"
        tcp = "${POD_IP}:21000"
        deregister_critical_service_after = "6h"
      }
    ]
  }
  EOF

  /consul-bin/consul services register \
    /consul/service/service.hcl

Limits:
  cpu:     50m
  memory:  50Mi
Requests:
  cpu:     50m
  memory:  50Mi
Environment:
  HOST_IP:            (v1:status.hostIP)
  POD_IP:             (v1:status.podIP)
  POD_NAME:           (v1:metadata.name)
  CONSUL_HTTP_ADDR:  http://$(HOST_IP):8500
Mounts:
  /consul-bin from consul-bin (rw)
  /consul/service from consul-service (rw)

Containers: ingress-gateway: Image: envoyproxy/envoy-alpine:v1.16.0 Ports: 21000/TCP, 8080/TCP, 8443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Command: /consul-bin/consul connect envoy -gateway=ingress -proxy-id=$(POD_NAME) -address=$(POD_IP):21000 Limits: cpu: 100m memory: 100Mi Requests: cpu: 100m memory: 100Mi Liveness: tcp-socket :21000 delay=30s timeout=5s period=10s #success=1 #failure=3 Readiness: tcp-socket :21000 delay=10s timeout=5s period=10s #success=1 #failure=3 Environment: HOST_IP: (v1:status.hostIP) POD_IP: (v1:status.podIP) POD_NAME: (v1:metadata.name) CONSUL_HTTP_ADDR: http://$(HOST_IP):8500 CONSUL_GRPC_ADDR: $(HOST_IP):8502 Mounts: /consul-bin from consul-bin (rw) lifecycle-sidecar: Image: hashicorp/consul-k8s:0.21.0 Port: Host Port: Command: consul-k8s lifecycle-sidecar -service-config=/consul/service/service.hcl -consul-binary=/consul-bin/consul Limits: cpu: 20m memory: 50Mi Requests: cpu: 20m memory: 25Mi Environment: HOST_IP: (v1:status.hostIP) POD_IP: (v1:status.podIP) CONSUL_HTTP_ADDR: http://$(HOST_IP):8500 Mounts: /consul-bin from consul-bin (rw) /consul/service from consul-service (ro) Volumes: consul-bin: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: consul-service: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: Conditions: Type Status Reason


Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: consul-ingress-gateway-685756cffc (1/1 replicas created) NewReplicaSet: Events: Type Reason Age From Message


Normal ScalingReplicaSet 26m deployment-controller Scaled up replica set consul-ingress-gateway-685756cffc to 1

</details>

### Expected behavior

The `ingress-gateway` container should be started successfully.

### Environment details

- `consul-k8s` version: `0.21.0`
- `consul-helm` version: `0.27.0`
- `values.yaml` used to deploy the helm chart:

### Additional Context

I have run the similar install on MacOS with minikube and it's working perfectly fine. These are the additional info for the OS of Raspberry Pi 4:

pi@raspberrypi:~ $ lsb_release -a No LSB modules are available. Distributor ID: Raspbian Description: Raspbian GNU/Linux 10 (buster) Release: 10 Codename: buster


While I did some digging on the logs message, there are still no clear direction for me on how I can resolve this. One of the notable one is this issue:

- [Docker run “someImage” returns: standard_init_linux.go190: exec user process caused “no such file or directory”](https://raspberrypi.stackexchange.com/questions/85658/docker-run-someimage-returns-standard-init-linux-go190-exec-user-process-cau)

where it mentioned it's due to compilation issue. However, I have not much expertise in this area to resolve. Hopefully, this additional context help! And also thanks for the OSS work 💯 

**Updates**

When attempting to install Grafana and Prometheus, the similar issues occur and I manage to track down the root cause to be due to the `envoy` sidecar.

Here's the `kubectl get pods`:

NAME READY STATUS RESTARTS AGE consul-webhook-cert-manager-5bc5bb4c86-7gtqc 1/1 Running 0 61m consul-controller-597c88b45d-kxzc6 1/1 Running 0 61m consul-server-0 1/1 Running 0 60m svclb-consul-ingress-gateway-q9stn 2/2 Running 0 61m consul-8t529 1/1 Running 0 61m consul-connect-injector-webhook-deployment-644b69667c-g8gq2 1/1 Running 3 61m consul-ingress-gateway-685756cffc-fvvvm 1/2 CrashLoopBackOff 16 61m prometheus-kube-state-metrics-95d956569-ntw4p 1/3 CrashLoopBackOff 8 3m21s prometheus-node-exporter-d7r48 2/3 CrashLoopBackOff 4 3m21s grafana-757b7d6b5-9t7tb 2/3 CrashLoopBackOff 3 2m40s prometheus-pushgateway-bdf95597-shmcv 2/3 CrashLoopBackOff 4 3m21s prometheus-server-79bbc6b897-dnvkb 3/4 CrashLoopBackOff 4 3m21s


Notice that all of the container is starting successfully until the turn of the `consul-connect-envoy-sidecar`. Further inspection upon the logs _(E.g `kubectl logs prometheus-node-exporter-d7r48 consul-connect-envoy-sidecar`)_, a similar log message is also return:

╰─➤ kubectl logs prometheus-node-exporter-d7r48 consul-connect-envoy-sidecar 130 ↵ standard_init_linux.go:219: exec user process caused: exec format error

blake commented 3 years ago

@kw7oe Consul's Helm chart defaults to using envoyproxy/envoy-alpine as its Envoy image which only support the x86_64 architecture. You'll need to use envoyproxy/envoy which supports ARM64.

You can configure the chart to use this alternate image using the global.imageEnvoy config option.

# values.yaml
---
global:
  imageEnvoy: envoyproxy/envoy:v1.16.2
kw7oe commented 3 years ago

@kw7oe Consul's Helm chart defaults to using envoyproxy/envoy-alpine as its Envoy image which only support the x86_64 architecture. You'll need to use envoyproxy/envoy which supports ARM64.

You can configure the chart to use this alternate image using the global.imageEnvoy config option.

# values.yaml
---
global:
  imageEnvoy: envoyproxy/envoy:v1.16.2

Thanks! Will try this out soon. Good to close this now 👍

kw7oe commented 3 years ago

Just want to add on further on this, just in case anyone came across a similar issue in the future.

On top of specifying the image to use envoyproxy/envoy:v1.16.2, if you are running on Raspberry Pi wit Raspbian OS, you might face error pulling docker image since, Raspbian OS is actually armv7l a 32-bit OS (This can be checked by running uname -m), and envoyproxy/envoy are only build for arm64. You can refer more to this StackOverflow questions which have a similar issue.

To resolve this issue, I have to reformat my Raspberry Pi SD card to use Ubuntu 20.04 instead of Raspbian OS, which would be an arm64 OS.

And if you are planning to run any Docker image that you build on your K3s cluster, do be sure that they are also supported for arm64 architecture, else you'll get a similar error as shown in the above.

Not an expert here, so some of the terminology might be wrong, but the core idea is there. Hope it helps.