hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
670 stars 324 forks source link

consul-connect-inject-init: "Unable to get Agent services, connect: no route to host" #1435

Closed MinHtetO closed 2 years ago

MinHtetO commented 2 years ago

Background of the issue I'm trying to test consul service mesh locally on my mac with docker desktop k8. I used helm to install consul with consul server instance 1 since I am testing on my laptop ( only one node ). The consul pods are running with no issue. But when I make the deployment, the pods are not ready and have Init:Error status issue. When I check the logs of consul-connect-inject-init on the pods, it shows unable to get agent services error log.

Steps to reproduce

  1. Install consul with helm helm install consul hashicorp/consul --set global.name=consul --set server.replicas=1 --set server.bootstrapExpect=1 --set connectInject.enabled=true --set connectInject.default=true --create-namespace --namespace consul

  2. Create the deployment and service - It's simple k8 deployment with tcp probs, the services are tested with no issues on k8 ( without consul mesh )

The logs

➜  utils git:(master) ✗ kubectl get pod
NAME                    READY   STATUS       RESTARTS   AGE
auth-85f6bbb7b9-fmlmr   0/2     Init:Error   1          13m
auth-85f6bbb7b9-jbt44   0/2     Init:Error   1          13m
auth-85f6bbb7b9-x2j89   0/2     Init:Error   1          13m
redisserver-master-0    1/1     Running      0          37h
➜  utils git:(master) ✗ kubectl logs auth-85f6bbb7b9-fmlmr consul-connect-inject-init
2022-08-19T06:11:56.561Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
2022-08-19T06:11:59.634Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
2022-08-19T06:12:02.705Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
2022-08-19T06:12:05.782Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
2022-08-19T06:12:08.850Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
2022-08-19T06:12:11.922Z [ERROR] Unable to get Agent services: error="Get "http://192.168.65.4:8500/v1/agent/services?filter=Meta%5B%22pod-name%22%5D+%3D%3D+%22auth-85f6bbb7b9-fmlmr%22+and+Meta%5B%22k8s-namespace%22%5D+%3D%3D+%22default%22+": dial tcp 192.168.65.4:8500: connect: no route to host"
➜  utils git:(master) ✗ kubectl get pod -n consul
NAME                                           READY   STATUS    RESTARTS   AGE
consul-client-5pf4h                            1/1     Running   0          26m
consul-connect-injector-5665554754-45srg       1/1     Running   0          26m
consul-connect-injector-5665554754-pvvql       1/1     Running   0          26m
consul-server-0                                1/1     Running   0          26m
consul-webhook-cert-manager-58fd94f949-tkq9n   1/1     Running   0          26m
➜  utils git:(master) ✗ kubectl describe pod consul-server-0 -n consul
Name:         consul-server-0
Namespace:    consul
Priority:     0
Node:         docker-desktop/192.168.65.4
Start Time:   Fri, 19 Aug 2022 12:20:32 +0630
Labels:       app=consul
              chart=consul-helm
              component=server
              controller-revision-hash=consul-server-ccc5b8b45
              hasDNS=true
              release=consul
              statefulset.kubernetes.io/pod-name=consul-server-0
Annotations:  consul.hashicorp.com/config-checksum: c689a26d3f47c98c7d8fa641d2bb8f6ce18409d1a0f55dde894639f7d1270ec8
              consul.hashicorp.com/connect-inject: false
Status:       Running
IP:           10.1.0.166
IPs:
  IP:           10.1.0.166
Controlled By:  StatefulSet/consul-server
Containers:
  consul:
    Container ID:  docker://b108a6017d67c1ba8925131f5b6bb161e541071a3f6099c24bc5a150a31673de
    Image:         hashicorp/consul:1.13.1
    Image ID:      docker-pullable://hashicorp/consul@sha256:c014bbf14bbd08bbcabe5386ab01aaedc385cd5c43c2024b340e2b5692111ce7
    Ports:         8500/TCP, 8503/TCP, 8301/TCP, 8301/UDP, 8302/TCP, 8302/UDP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/UDP, 0/TCP, 0/UDP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec

      cp /consul/config/extra-from-values.json /consul/extra-config/extra-from-values.json
      [ -n "${HOST_IP}" ] && sed -Ei "s|HOST_IP|${HOST_IP?}|g" /consul/extra-config/extra-from-values.json
      [ -n "${POD_IP}" ] && sed -Ei "s|POD_IP|${POD_IP?}|g" /consul/extra-config/extra-from-values.json
      [ -n "${HOSTNAME}" ] && sed -Ei "s|HOSTNAME|${HOSTNAME?}|g" /consul/extra-config/extra-from-values.json

      exec /usr/local/bin/docker-entrypoint.sh consul agent \
        -advertise="${ADVERTISE_IP}" \
        -config-dir=/consul/config \
        -config-file=/consul/extra-config/extra-from-values.json

    State:          Running
      Started:      Fri, 19 Aug 2022 12:20:34 +0630
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader \
2>/dev/null | grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      ADVERTISE_IP:               (v1:status.podIP)
      HOST_IP:                    (v1:status.hostIP)
      POD_IP:                     (v1:status.podIP)
      CONSUL_DISABLE_PERM_MGMT:  true
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-consul (rw)
      /consul/extra-config from extra-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bwnl2 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data-consul:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-consul-consul-server-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-server-config
    Optional:  false
  extra-config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-bwnl2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  27m   default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         27m   default-scheduler  Successfully assigned consul/consul-server-0 to docker-desktop
  Normal   Pulled            27m   kubelet            Container image "hashicorp/consul:1.13.1" already present on machine
  Normal   Created           27m   kubelet            Created container consul
  Normal   Started           27m   kubelet            Started container consul
  Warning  Unhealthy         27m   kubelet            Readiness probe failed:
➜  utils git:(master) ✗ kubectl logs consul-server-0 -n consul
==> Starting Consul agent...
           Version: '1.13.1'
        Build Date: '2022-08-11 19:07:00 +0000 UTC'
           Node ID: '9b67f724-75a9-74bd-4e5f-cf78d9c8a328'
         Node name: 'consul-server-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8503, DNS: 8600)
      Cluster Addr: 10.1.0.166 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2022-08-19T05:50:36.369Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2022-08-19T05:50:36.369Z [WARN]  agent: bootstrap = true: do not enable unless necessary
2022-08-19T05:50:36.562Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2022-08-19T05:50:36.562Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
2022-08-19T05:50:36.585Z [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:9b67f724-75a9-74bd-4e5f-cf78d9c8a328 Address:10.1.0.166:8300}]"
2022-08-19T05:50:36.661Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.1.0.166:8300 [Follower]" leader-address= leader-id=
2022-08-19T05:50:36.661Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc1 10.1.0.166
2022-08-19T05:50:36.662Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-server-0 10.1.0.166
2022-08-19T05:50:36.662Z [INFO]  agent.router: Initializing LAN area manager
2022-08-19T05:50:36.662Z [INFO]  agent.server.autopilot: reconciliation now disabled
2022-08-19T05:50:36.662Z [INFO]  agent.server: Adding LAN server: server="consul-server-0 (Addr: tcp/10.1.0.166:8300) (DC: dc1)"
2022-08-19T05:50:36.662Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc1 area=wan
2022-08-19T05:50:36.664Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2022-08-19T05:50:36.664Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2022-08-19T05:50:36.664Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
2022-08-19T05:50:36.664Z [INFO]  agent: Started gRPC server: address=[::]:8503 network=tcp
2022-08-19T05:50:36.664Z [INFO]  agent: started state syncer
2022-08-19T05:50:36.664Z [INFO]  agent: Consul agent running!
2022-08-19T05:50:36.664Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2022-08-19T05:50:36.664Z [INFO]  agent: Joining cluster...: cluster=LAN
2022-08-19T05:50:36.664Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server.consul.svc:8301]
2022-08-19T05:50:36.675Z [INFO]  agent: (LAN) joined: number_of_nodes=1
2022-08-19T05:50:36.675Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2022-08-19T05:50:41.724Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-08-19T05:50:41.724Z [INFO]  agent.server.raft: entering candidate state: node="Node at 10.1.0.166:8300 [Candidate]" term=2
2022-08-19T05:50:41.768Z [INFO]  agent.server.raft: election won: tally=1
2022-08-19T05:50:41.768Z [INFO]  agent.server.raft: entering leader state: leader="Node at 10.1.0.166:8300 [Leader]"
2022-08-19T05:50:41.768Z [INFO]  agent.server: cluster leadership acquired
2022-08-19T05:50:41.768Z [INFO]  agent.server: New leader elected: payload=consul-server-0
2022-08-19T05:50:41.862Z [INFO]  agent.server.autopilot: reconciliation now enabled
2022-08-19T05:50:41.862Z [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
2022-08-19T05:50:41.862Z [INFO]  agent.leader: started routine: routine="federation state pruning"
2022-08-19T05:50:41.867Z [INFO]  agent: Synced node info
2022-08-19T05:50:41.872Z [INFO]  connect.ca: initialized primary datacenter CA with provider: provider=consul
2022-08-19T05:50:41.872Z [INFO]  agent.leader: started routine: routine="intermediate cert renew watch"
2022-08-19T05:50:41.872Z [INFO]  agent.leader: started routine: routine="CA root pruning"
2022-08-19T05:50:41.872Z [INFO]  agent.leader: started routine: routine="CA root expiration metric"
2022-08-19T05:50:41.872Z [INFO]  agent.leader: started routine: routine="CA signing expiration metric"
2022-08-19T05:50:41.872Z [INFO]  agent.leader: started routine: routine="virtual IP version check"
2022-08-19T05:50:41.874Z [INFO]  agent.server: member joined, marking health alive: member=consul-server-0 partition=default
2022-08-19T05:50:41.876Z [INFO]  agent.leader: stopping routine: routine="virtual IP version check"
2022-08-19T05:50:41.876Z [INFO]  agent.leader: stopped routine: routine="virtual IP version check"
2022-08-19T05:50:42.092Z [INFO]  agent.server: federation state anti-entropy synced
2022-08-19T05:51:03.953Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: docker-desktop 10.1.0.164
2022-08-19T05:51:03.953Z [INFO]  agent.server: member joined, marking health alive: member=docker-desktop partition=default

Environment

MacOS Version 12.4

Helm Version

➜  utils git:(master) ✗ helm version
version.BuildInfo{Version:"v3.9.2", GitCommit:"1addefbfe665c350f4daf868a9adc5600cc064fd", GitTreeState:"clean", GoVersion:"go1.17.12"}

Docker Desktop Version 4.2.0

david-yu commented 2 years ago

Hi @MinHtetO does install with the values found here https://learn.hashicorp.com/tutorials/consul/kubernetes-kind?in=consul/kubernetes-deploy#create-a-values-file also produce the same error? I believe perhaps the bootstrapExpect may be what is causing the issue here.

MinHtetO commented 2 years ago

Hi @david-yu ,

I removed the bootstrapExpect param and the issues was solved. Thank you.