hashicorp / consul-helm

Helm chart to install Consul and other associated components.
Mozilla Public License 2.0
419 stars 386 forks source link

Readiness Probe error in helm chart #563

Closed robece closed 4 years ago

robece commented 4 years ago

Hi everyone, I'm having the same issue, I'm doing something pretty simple as this:

helm upgrade --install consul hashicorp/consul --set global.name=consul --set global.domain=consul --set ui.service.type=LoadBalancer --namespace consul

I'm getting a Rediness Probe error, I would like to know if there is a way to disable it or modify the values since there is no option to do it as a parameter in helm chart and I could not start the services.

I think there is a lock between services not ready by readiness probe and servers not finding candidates to promote as a leader.

Thanks, RC

ishustava commented 4 years ago

Hey @robece

Thanks for creating an issue!

I've tried running the command you mentioned against my GKE cluster, and I'm seeing all pods come up and ready.

$ kubectl get pods -n consul
NAME              READY   STATUS    RESTARTS   AGE
consul-fszb5      1/1     Running   0          113s
consul-pb6dm      1/1     Running   0          113s
consul-server-0   1/1     Running   0          113s
consul-server-1   1/1     Running   0          112s
consul-server-2   1/1     Running   0          112s
consul-wfpkq      1/1     Running   0          113s

There could be something else going on. Which readiness probe is failining? Could post output of kubectl get pods and logs from the failing pod(s)?

manisha-tanwar commented 4 years ago

Hi @ishustava

I tried helm install(with default values) & getting same error, here are the logs and errors I'm getting:

$ kubectl get StatefulSet
NAME                                READY   AGE
quarrelsome-hamster-consul-server   0/3     8m36s

$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
quarrelsome-hamster-consul-5zgzv      0/1     Running   0          8m27s
quarrelsome-hamster-consul-server-0   0/1     Running   0          8m27s
quarrelsome-hamster-consul-server-1   0/1     Pending   0          8m27s
quarrelsome-hamster-consul-server-2   0/1     Pending   0          8m27s

$ kubectl get events  --sort-by='{.metadata.creationTimestamp}'
LAST SEEN   TYPE      REASON                    OBJECT                                                                   MESSAGE
16m         Normal    Starting                  node/docker-desktop                                                      Starting kubelet.
16m         Normal    NodeAllocatableEnforced   node/docker-desktop                                                      Updated Node Allocatable limit across pods
16m         Normal    NodeHasSufficientPID      node/docker-desktop                                                      Node docker-desktop status is now: NodeHasSufficientPID
16m         Normal    NodeHasSufficientMemory   node/docker-desktop                                                      Node docker-desktop status is now: NodeHasSufficientMemory
16m         Normal    NodeHasNoDiskPressure     node/docker-desktop                                                      Node docker-desktop status is now: NodeHasNoDiskPressure
15m         Normal    RegisteredNode            node/docker-desktop                                                      Node docker-desktop event: Registered Node docker-desktop in Controller
15m         Normal    Starting                  node/docker-desktop                                                      Starting kube-proxy.
10m         Warning   FailedScheduling          pod/quarrelsome-hamster-consul-server-1                                  pod has unbound immediate PersistentVolumeClaims
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Claim data-default-quarrelsome-hamster-consul-server-2 Pod quarrelsome-hamster-consul-server-2 in StatefulSet quarrelsome-hamster-consul-server success
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Pod quarrelsome-hamster-consul-server-2 in StatefulSet quarrelsome-hamster-consul-server successful
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Pod quarrelsome-hamster-consul-server-1 in StatefulSet quarrelsome-hamster-consul-server successful
10m         Normal    ExternalProvisioning      persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-2   waiting for a volume to be created, either by external provisioner "docker.io/hostpath" or manually created by system administrator
10m         Normal    Provisioning              persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-1   External provisioner is provisioning volume for claim "default/data-default-quarrelsome-hamster-consul-server-1"
10m         Normal    ExternalProvisioning      persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-1   waiting for a volume to be created, either by external provisioner "docker.io/hostpath" or manually created by system administrator
10m         Normal    ProvisioningSucceeded     persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-0   Successfully provisioned volume pvc-f0690c05-4d16-4aa2-8c76-84bde133ffca
10m         Normal    Provisioning              persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-0   External provisioner is provisioning volume for claim "default/data-default-quarrelsome-hamster-consul-server-0"
10m         Normal    Scheduled                 pod/quarrelsome-hamster-consul-5zgzv                                     Successfully assigned default/quarrelsome-hamster-consul-5zgzv to docker-desktop
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Claim data-default-quarrelsome-hamster-consul-server-1 Pod quarrelsome-hamster-consul-server-1 in StatefulSet quarrelsome-hamster-consul-server success
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Pod quarrelsome-hamster-consul-server-0 in StatefulSet quarrelsome-hamster-consul-server successful
10m         Normal    SuccessfulCreate          statefulset/quarrelsome-hamster-consul-server                            create Claim data-default-quarrelsome-hamster-consul-server-0 Pod quarrelsome-hamster-consul-server-0 in StatefulSet quarrelsome-hamster-consul-server success
10m         Normal    ExternalProvisioning      persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-0   waiting for a volume to be created, either by external provisioner "docker.io/hostpath" or manually created by system administrator
10m         Normal    NoPods                    poddisruptionbudget/quarrelsome-hamster-consul-server                    No matching pods found
10m         Warning   FailedScheduling          pod/quarrelsome-hamster-consul-server-0                                  pod has unbound immediate PersistentVolumeClaims
10m         Warning   FailedScheduling          pod/quarrelsome-hamster-consul-server-2                                  pod has unbound immediate PersistentVolumeClaims
10m         Normal    SuccessfulCreate          daemonset/quarrelsome-hamster-consul                                     Created pod: quarrelsome-hamster-consul-5zgzv
10m         Normal    Provisioning              persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-2   External provisioner is provisioning volume for claim "default/data-default-quarrelsome-hamster-consul-server-2"
10m         Normal    ProvisioningSucceeded     persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-2   Successfully provisioned volume pvc-0dd87b0f-528b-45ce-9539-9198f3607bf4
10m         Normal    ProvisioningSucceeded     persistentvolumeclaim/data-default-quarrelsome-hamster-consul-server-1   Successfully provisioned volume pvc-00bc9d36-7cc6-4200-a33f-cb9409ec0c42
10m         Normal    Pulling                   pod/quarrelsome-hamster-consul-5zgzv                                     Pulling image "consul:1.8.1"
9s          Warning   FailedScheduling          pod/quarrelsome-hamster-consul-server-1                                  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules.
10m         Normal    Scheduled                 pod/quarrelsome-hamster-consul-server-0                                  Successfully assigned default/quarrelsome-hamster-consul-server-0 to docker-desktop
9s          Warning   FailedScheduling          pod/quarrelsome-hamster-consul-server-2                                  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules.
10m         Normal    Pulling                   pod/quarrelsome-hamster-consul-server-0                                  Pulling image "consul:1.8.1"
10m         Normal    Started                   pod/quarrelsome-hamster-consul-5zgzv                                     Started container consul
10m         Normal    Created                   pod/quarrelsome-hamster-consul-5zgzv                                     Created container consul
10m         Normal    Pulled                    pod/quarrelsome-hamster-consul-5zgzv                                     Successfully pulled image "consul:1.8.1"
10m         Normal    Pulled                    pod/quarrelsome-hamster-consul-server-0                                  Successfully pulled image "consul:1.8.1"
10m         Normal    Started                   pod/quarrelsome-hamster-consul-server-0                                  Started container consul
10m         Normal    Created                   pod/quarrelsome-hamster-consul-server-0                                  Created container consul
13s         Warning   Unhealthy                 pod/quarrelsome-hamster-consul-server-0                                  Readiness probe failed:
7s          Warning   Unhealthy                 pod/quarrelsome-hamster-consul-5zgzv                                     Readiness probe failed:

$ kubectl logs -f StatefulSet/quarrelsome-hamster-consul-server --all-containers
Found 3 pods, using pod/quarrelsome-hamster-consul-server-0
bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent...
           Version: '1.8.1'
           Node ID: '9ff2e63b-3be3-5feb-bee6-c0c1cb5f4f3f'
         Node name: 'quarrelsome-hamster-consul-server-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.1.1.10 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-05T23:23:44.500Z [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
    2020-08-05T23:23:44.595Z [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
    2020-08-05T23:23:44.596Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.1.1.10:8300 [Follower]" leader=
    2020-08-05T23:23:44.597Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: quarrelsome-hamster-consul-server-0.dc1 10.1.1.10
    2020-08-05T23:23:44.598Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: quarrelsome-hamster-consul-server-0 10.1.1.10
    2020-08-05T23:23:44.689Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2020-08-05T23:23:44.786Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-08-05T23:23:44.689Z [INFO]  agent.server: Handled event for server in area: event=member-join server=quarrelsome-hamster-consul-server-0.dc1 area=wan
    2020-08-05T23:23:44.689Z [INFO]  agent.server: Adding LAN server: server="quarrelsome-hamster-consul-server-0 (Addr: tcp/10.1.1.10:8300) (DC: dc1)"
    2020-08-05T23:23:44.788Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-08-05T23:23:44.789Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-05T23:23:44.790Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-05T23:23:44.791Z [INFO]  agent: (LAN) joining: lan_addresses=[quarrelsome-hamster-consul-server-0.quarrelsome-hamster-consul-server.default.svc, quarrelsome-hamster-consul-server-1.quarrelsome-hamster-consul-server.default.svc, quarrelsome-hamster-consul-server-2.quarrelsome-hamster-consul-server.default.svc]
    2020-08-05T23:23:44.790Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-05T23:23:45.286Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve quarrelsome-hamster-consul-server-1.quarrelsome-hamster-consul-server.default.svc: lookup quarrelsome-hamster-consul-server-1.quarrelsome-hamster-consul-server.default.svc on 10.96.0.10:53: no such host
    2020-08-05T23:23:45.387Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve quarrelsome-hamster-consul-server-2.quarrelsome-hamster-consul-server.default.svc: lookup quarrelsome-hamster-consul-server-2.quarrelsome-hamster-consul-server.default.svc on 10.96.0.10:53: no such host
    2020-08-05T23:23:45.387Z [INFO]  agent: (LAN) joined: number_of_nodes=1
    2020-08-05T23:23:45.387Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
    2020-08-05T23:23:51.840Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-05T23:23:53.960Z [WARN]  agent.server.raft: no known peers, aborting election
    2020-08-05T23:24:13.759Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: docker-desktop 10.1.1.9
    2020-08-05T23:24:18.961Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-05T23:24:26.366Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

Any help would be appreciated!

manisha-tanwar commented 4 years ago

Nevermind, https://github.com/hashicorp/consul-helm/issues/13 helped me to fix the issue.

ishustava commented 4 years ago

Hey @manisha-tanwar, yeah if you're running on minikube you need to make sure to scale down your servers. Since there's only one node, the anti-affinity settings will not let Kubernetes schedule the remaining consul server pods, and Consul will never be able to elect a leader.

We also have this guide that describes the values you'd need to install this Helm chart on minikube. @robece if you're running on minikube, please use this guide for reference.

robece commented 4 years ago

Thanks @ishustava, sorry for the delay, I'm running the deployment on AKS.

After execute:

helm upgrade --install consul hashicorp/consul --set global.name=consul --set global.domain=consul --set ui.service.type=LoadBalancer --namespace consul

I got:

PS C:\Users\robece> kubectl get pods -n consul
NAME              READY   STATUS    RESTARTS   AGE
consul-4bhft      0/1     Running   0          3m40s
consul-mrmsn      0/1     Running   0          3m40s
consul-server-0   0/1     Running   0          3m40s
consul-server-1   0/1     Running   0          3m40s
consul-server-2   0/1     Pending   0          3m40s

Log for consul-4bhft:

==> Starting Consul agent...
           Version: '1.8.1'
           Node ID: '22c19d50-3f68-9793-a935-12fb4e7e1e09'
         Node name: 'aks-agentpool-32368277-vmss000000'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.244.1.170 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-07T13:21:19.108Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000000 10.244.1.170
    2020-08-07T13:21:19.110Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2020-08-07T13:21:19.111Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-08-07T13:21:19.111Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-08-07T13:21:19.112Z [INFO]  agent: Started gRPC server: address=[::]:8502 network=tcp
    2020-08-07T13:21:19.113Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-07T13:21:19.113Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-07T13:21:19.113Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-07T13:21:19.113Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:21:19.113Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:19.113Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:21:19.205Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.453Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.684Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.684Z [WARN]  agent: (LAN) couldn't join: number_of_nodes=0 error="3 errors occurred:
        * Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host

"
    2020-08-07T13:21:19.688Z [WARN]  agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=<nil>
    2020-08-07T13:21:25.507Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:25.508Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:42814 error="No known Consul servers"
    2020-08-07T13:21:35.504Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:35.504Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:42898 error="No known Consul servers"
    2020-08-07T13:21:45.508Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:45.508Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:42982 error="No known Consul servers"
    2020-08-07T13:21:48.526Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:48.526Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:21:49.688Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:21:49.848Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:50.075Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:50.261Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:50.261Z [WARN]  agent: (LAN) couldn't join: number_of_nodes=0 error="3 errors occurred:
        * Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host

"
    2020-08-07T13:21:50.261Z [WARN]  agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=<nil>
    2020-08-07T13:21:55.503Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:55.503Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:43082 error="No known Consul servers"
    2020-08-07T13:22:05.453Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:05.453Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:43176 error="No known Consul servers"
    2020-08-07T13:22:06.153Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:06.153Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:22:15.462Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:15.462Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:43280 error="No known Consul servers"
    2020-08-07T13:22:20.263Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:22:20.487Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: consul-server-1 10.244.0.95
    2020-08-07T13:22:20.487Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: consul-server-0 10.244.1.171
    2020-08-07T13:22:20.487Z [INFO]  agent.client: adding server: server="consul-server-1 (Addr: tcp/10.244.0.95:8300) (DC: dc1)"
    2020-08-07T13:22:20.487Z [INFO]  agent.client: adding server: server="consul-server-0 (Addr: tcp/10.244.1.171:8300) (DC: dc1)"
    2020-08-07T13:22:20.611Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000001 10.244.0.94
    2020-08-07T13:22:20.769Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:20.769Z [INFO]  agent: (LAN) joined: number_of_nodes=2
    2020-08-07T13:22:20.769Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=2
    2020-08-07T13:22:28.846Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:28.846Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:22:28.884Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:28.884Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:22:38.117Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:38.117Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:04.345Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:04.345Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:11.868Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:11.868Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:31.423Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:31.423Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:45.424Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:45.424Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:01.512Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:01.512Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:14.655Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:14.655Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:37.733Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:37.733Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:40.657Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:40.657Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:04.513Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:04.513Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:09.000Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:09.000Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:40.851Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:40.851Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:43.628Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:43.628Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:16.603Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:16.603Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:16.866Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:16.866Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:52.028Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:52.029Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:52.752Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:52.752Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:25.439Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:25.439Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:25.649Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:25.649Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:51.770Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:51.770Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:57.467Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:57.467Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:28.093Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:28.093Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:28.442Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:28.442Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:53.185Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:53.185Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:57.828Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:57.829Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"

Log for consul-mrmsn:

==> Starting Consul agent...
           Version: '1.8.1'
           Node ID: 'c7b09676-dbef-fc06-d46d-70131c2a1336'
         Node name: 'aks-agentpool-32368277-vmss000001'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.244.0.94 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-07T13:21:18.871Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000001 10.244.0.94
    2020-08-07T13:21:18.875Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-08-07T13:21:18.875Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2020-08-07T13:21:18.876Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-08-07T13:21:18.876Z [INFO]  agent: Started gRPC server: address=[::]:8502 network=tcp
    2020-08-07T13:21:18.877Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-07T13:21:18.877Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-07T13:21:18.877Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:21:18.963Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:18.963Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:21:18.963Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-07T13:21:19.169Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.454Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.659Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:19.659Z [WARN]  agent: (LAN) couldn't join: number_of_nodes=0 error="3 errors occurred:
        * Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host

"
    2020-08-07T13:21:19.659Z [WARN]  agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=<nil>
    2020-08-07T13:21:19.767Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:19.767Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44324 error="No known Consul servers"
    2020-08-07T13:21:29.736Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:29.736Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44438 error="No known Consul servers"
    2020-08-07T13:21:34.490Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:34.490Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:21:39.741Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:39.741Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44528 error="No known Consul servers"
    2020-08-07T13:21:49.659Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:21:49.763Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:49.763Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44630 error="No known Consul servers"
    2020-08-07T13:21:49.818Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:49.987Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:49.987Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:21:50.127Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:50.366Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:21:50.366Z [WARN]  agent: (LAN) couldn't join: number_of_nodes=0 error="3 errors occurred:
        * Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-1.consul-server.consul.svc: lookup consul-server-1.consul-server.consul.svc on 10.0.0.10:53: no such host
        * Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host

"
    2020-08-07T13:21:50.366Z [WARN]  agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error=<nil>
    2020-08-07T13:21:59.763Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:21:59.763Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44726 error="No known Consul servers"
    2020-08-07T13:22:07.388Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:07.388Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    2020-08-07T13:22:09.735Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:09.735Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44830 error="No known Consul servers"
    2020-08-07T13:22:19.732Z [WARN]  agent.client.manager: No servers available
    2020-08-07T13:22:19.732Z [ERROR] agent.http: Request error: method=GET url=/v1/status/leader from=127.0.0.1:44948 error="No known Consul servers"
    2020-08-07T13:22:20.367Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:22:20.490Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000000 10.244.1.170
    2020-08-07T13:22:20.490Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: consul-server-0 10.244.1.171
    2020-08-07T13:22:20.490Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: consul-server-1 10.244.0.95
    2020-08-07T13:22:20.490Z [INFO]  agent.client: adding server: server="consul-server-0 (Addr: tcp/10.244.1.171:8300) (DC: dc1)"
    2020-08-07T13:22:20.490Z [INFO]  agent.client: adding server: server="consul-server-1 (Addr: tcp/10.244.0.95:8300) (DC: dc1)"
    2020-08-07T13:22:20.898Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:20.899Z [INFO]  agent: (LAN) joined: number_of_nodes=2
    2020-08-07T13:22:20.899Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=2
    2020-08-07T13:22:29.613Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:29.613Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:22:39.214Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:39.214Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:22:40.305Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:22:40.305Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:07.501Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:07.502Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:17.227Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:17.227Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:32.025Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:32.025Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:23:41.058Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:23:41.058Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:04.166Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:04.166Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:16.737Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:16.737Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:26.831Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:26.831Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:45.511Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:45.511Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:24:55.414Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:24:55.414Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:09.378Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:09.378Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:22.522Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:22.522Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:41.050Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:41.050Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:25:54.435Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:25:54.435Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:04.467Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:04.467Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:26.906Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:26.906Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:34.617Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:34.617Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:26:57.362Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:26:57.363Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:02.787Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:02.787Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:29.698Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:29.698Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:27:32.824Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:27:32.824Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:01.154Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:01.154Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:06.275Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:06.275Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:32.239Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:32.239Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:28:36.101Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:28:36.101Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:29:01.465Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:29:01.465Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:29:10.331Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:29:10.332Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:29:31.496Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:29:31.496Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:29:40.240Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.0.95:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:29:40.240Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
    2020-08-07T13:30:07.170Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:30:07.170Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
    2020-08-07T13:30:08.711Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.244.1.171:8300 error="rpc error making call: No cluster leader"
    2020-08-07T13:30:08.711Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"

Log for consul-server-0:

bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent...
           Version: '1.8.1'
           Node ID: '26c87030-23c9-ab22-eab5-6cd00559007c'
         Node name: 'consul-server-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.244.1.171 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-07T13:22:00.709Z [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
    2020-08-07T13:22:00.808Z [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
    2020-08-07T13:22:00.808Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.244.1.171:8300 [Follower]" leader=
    2020-08-07T13:22:00.809Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc1 10.244.1.171
    2020-08-07T13:22:00.810Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-server-0 10.244.1.171
    2020-08-07T13:22:00.810Z [INFO]  agent.server: Adding LAN server: server="consul-server-0 (Addr: tcp/10.244.1.171:8300) (DC: dc1)"
    2020-08-07T13:22:00.810Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2020-08-07T13:22:00.811Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-08-07T13:22:00.810Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc1 area=wan
    2020-08-07T13:22:00.811Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-08-07T13:22:00.812Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-07T13:22:00.812Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-07T13:22:00.812Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:22:00.812Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-07T13:22:00.930Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:00.936Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-server-1 10.244.0.95
    2020-08-07T13:22:00.936Z [INFO]  agent.server: Adding LAN server: server="consul-server-1 (Addr: tcp/10.244.0.95:8300) (DC: dc1)"
    2020-08-07T13:22:00.938Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.dc1 10.244.0.95
    2020-08-07T13:22:00.938Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.dc1 area=wan
    2020-08-07T13:22:01.010Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:01.010Z [INFO]  agent: (LAN) joined: number_of_nodes=1
    2020-08-07T13:22:01.010Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
    2020-08-07T13:22:06.683Z [WARN]  agent.server.raft: no known peers, aborting election
    2020-08-07T13:22:07.815Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:22:20.486Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000000 10.244.1.170
    2020-08-07T13:22:20.490Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000001 10.244.0.94
    2020-08-07T13:22:23.394Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:22:43.953Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:22:53.816Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:23:14.672Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:23:17.745Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:23:43.872Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:23:49.237Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:24:06.840Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:24:19.801Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:24:31.604Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:24:55.583Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:25:01.761Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:25:24.975Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:25:25.651Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:25:49.670Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:25:58.921Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:26:25.885Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:26:33.871Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:26:50.491Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:26:56.543Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:27:26.977Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:27:33.730Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:27:52.549Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:27:57.731Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:28:19.662Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:28:24.614Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:28:46.949Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:28:53.784Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:29:19.580Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:29:21.249Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:29:45.340Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:29:49.116Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:30:12.940Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:30:20.693Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:30:39.908Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:30:45.623Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

Log for consul-server-1:

bootstrap_expect > 0: expecting 3 servers
==> Starting Consul agent...
           Version: '1.8.1'
           Node ID: 'a409de23-be8b-8466-79d5-54ad5f9dc221'
         Node name: 'consul-server-1'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.244.0.95 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020-08-07T13:22:00.070Z [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
    2020-08-07T13:22:00.187Z [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
    2020-08-07T13:22:00.187Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.244.0.95:8300 [Follower]" leader=
    2020-08-07T13:22:00.188Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-1.dc1 10.244.0.95
    2020-08-07T13:22:00.188Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-server-1 10.244.0.95
    2020-08-07T13:22:00.188Z [INFO]  agent.server: Adding LAN server: server="consul-server-1 (Addr: tcp/10.244.0.95:8300) (DC: dc1)"
    2020-08-07T13:22:00.189Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2020-08-07T13:22:00.189Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-1.dc1 area=wan
    2020-08-07T13:22:00.189Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-08-07T13:22:00.189Z [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-08-07T13:22:00.263Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2020-08-07T13:22:00.263Z [INFO]  agent: Joining cluster...: cluster=LAN
    2020-08-07T13:22:00.263Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-server-0.consul-server.consul.svc, consul-server-1.consul-server.consul.svc, consul-server-2.consul-server.consul.svc]
    2020-08-07T13:22:00.263Z [INFO]  agent: started state syncer
==> Consul agent running!
    2020-08-07T13:22:00.295Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-0.consul-server.consul.svc: lookup consul-server-0.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:00.616Z [WARN]  agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-2.consul-server.consul.svc: lookup consul-server-2.consul-server.consul.svc on 10.0.0.10:53: no such host
    2020-08-07T13:22:00.616Z [INFO]  agent: (LAN) joined: number_of_nodes=1
    2020-08-07T13:22:00.616Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
    2020-08-07T13:22:00.936Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-server-0 10.244.1.171
    2020-08-07T13:22:00.936Z [INFO]  agent.server: Adding LAN server: server="consul-server-0 (Addr: tcp/10.244.1.171:8300) (DC: dc1)"
    2020-08-07T13:22:00.938Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-server-0.dc1 10.244.1.171
    2020-08-07T13:22:00.938Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-server-0.dc1 area=wan
    2020-08-07T13:22:07.328Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:22:08.684Z [WARN]  agent.server.raft: no known peers, aborting election
    2020-08-07T13:22:20.509Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000000 10.244.1.170
    2020-08-07T13:22:20.611Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: aks-agentpool-32368277-vmss000001 10.244.0.94
    2020-08-07T13:22:34.459Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:22:35.220Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:22:56.620Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:22:57.660Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:23:33.022Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:23:34.409Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:24:08.053Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:24:10.660Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:24:38.977Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:24:39.139Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:25:05.238Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:25:14.568Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:25:31.391Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:25:41.147Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:26:05.028Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:26:13.625Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:26:34.793Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:26:41.477Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:27:00.452Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:27:08.471Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:27:24.863Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:27:38.609Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:27:59.771Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:28:12.617Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:28:23.435Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:28:37.007Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:29:00.530Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:29:13.291Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:29:34.046Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:29:48.479Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:30:10.917Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:30:21.767Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:30:43.663Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:30:52.453Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:31:19.442Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-08-07T13:31:25.187Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:31:50.689Z [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-08-07T13:31:54.463Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

Log for consul-server-2:

Nothing to display.

Thank you, RC

ishustava commented 4 years ago

Thanks for this info @robece ! Could you also include the output of kubectl describe po/consul-server-2? The pending state looks like kubernetes couldn't schedule the third consul server pod, so I'm wondering if there is a resource constraint issue.

robece commented 4 years ago

Hope this help, please, let me know if there is anything else to share:

Name:           consul-server-2
Namespace:      consul
Priority:       0
Node:           <none>
Labels:         app=consul
                chart=consul-helm
                component=server
                controller-revision-hash=consul-server-5b89696b84
                hasDNS=true
                release=consul
                statefulset.kubernetes.io/pod-name=consul-server-2
Annotations:    consul.hashicorp.com/config-checksum: ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356
                consul.hashicorp.com/connect-inject: false
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/consul-server
Containers:
  consul:
    Image:       consul:1.8.1
    Ports:       8500/TCP, 8301/TCP, 8302/TCP, 8300/TCP, 8600/TCP, 8600/UDP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/UDP
    Command:
      /bin/sh
      -ec
      CONSUL_FULLNAME="consul"

      exec /bin/consul agent \
        -advertise="${POD_IP}" \
        -bind=0.0.0.0 \
        -bootstrap-expect=3 \
        -client=0.0.0.0 \
        -config-dir=/consul/config \
        -datacenter=dc1 \
        -data-dir=/consul/data \
        -domain=consul \
        -hcl="connect { enabled = true }" \
        -ui \
        -retry-join=${CONSUL_FULLNAME}-server-0.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-1.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -retry-join=${CONSUL_FULLNAME}-server-2.${CONSUL_FULLNAME}-server.${NAMESPACE}.svc \
        -server

    Limits:
      cpu:     100m
      memory:  100Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Readiness:  exec [/bin/sh -ec curl http://127.0.0.1:8500/v1/status/leader \
2>/dev/null | grep -E '".+"'
] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      POD_IP:      (v1:status.podIP)
      NAMESPACE:  consul (v1:metadata.namespace)
    Mounts:
      /consul/config from config (rw)
      /consul/data from data-consul (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from consul-server-token-6mpdx (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  data-consul:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-consul-consul-server-2
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      consul-server-config
    Optional:  false
  consul-server-token-6mpdx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  consul-server-token-6mpdx
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  49s (x4 over 2m6s)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match pod affinity/anti-affinity.

Thank you, RC

ishustava commented 4 years ago

I see what's going on! Looks like you have only two nodes in your AKS cluster, but consul servers are deployed with 3 replicas by default and have anti-affinity set to make sure servers aren't running on the same kubernetes node.

There are a few ways to fix it. You need to either:

  1. Remove affinity setting from the consul servers in your helm values by passing --set server.affinity="" to your helm upgrade command. This is fine to do for a test/sandbox environment, but I would recommend option 3 for production.
  2. Scale down your consul server to 1 by setting --set server.replicas=1 --set server.bootstrapExpect=1 flags for your helm upgrade command. This is similarly ok to do while testing, but for prod, you don't want to have only one server running.
  3. Scale up your Kubernetes node pool and add at least one more node, so that anti-affinity constraints for servers will be satisfied.

Hope this helps!

robece commented 4 years ago

Thank you so much, I'm executing this and it seems to be working:

helm upgrade --install consul hashicorp/consul --set global.name=consul --set global.domain=consul --set ui.service.type=LoadBalancer --set server.replicas=1 --set server.bootstrapExpect=1 --set server.affinity="" --namespace consul

Last question, just by curiosity, do you have any reference or example to deploy a production consul cluster via helm?, in order to have a sample to run, in my scenario I want to scale to more than one server but I don't know why the basic helm installation is failing.

Thank you, RC

ishustava commented 4 years ago

In your scenario, the reason it was failing is that the helm chart deploys 3 Consul servers, but it looks like you only have two Kubernetes nodes. We also set anti-affinity rules in such a way that no two consul servers can run on the same Kubernetes node, so when you have only two Kubernetes nodes, there's no place for the third consul server.

In your case, to use the default helm chart installation, you need to increase your Kubernetes node pool size to be >= 3.

For production, I would also recommend securing your installation following this guide that walks you through it.

robece commented 4 years ago

Thank you, this will help!! 🥇

ishustava commented 4 years ago

Great! I'll close this issue, for now, but let us know if you have more feedback or experience more issues with it!