kelseyhightower / consul-on-kubernetes

Running HashiCorp's Consul on Kubernetes
Apache License 2.0
601 stars 183 forks source link

[ERR] agent: Coordinate update error: No cluster leader #36

Open duke-lv opened 6 years ago

duke-lv commented 6 years ago

i have deploy the consul latest version on kubernetes V1.10.0 .but the consul pod's log show these error message: 2018/07/20 11:26:11 [WARN] agent: Check "service:ribbon-consumer" HTTP request failed: Get http://DESKTOP-MCQSJ49:8504/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2018/07/20 11:26:15 [ERR] agent: failed to sync remote state: No cluster leader 2018/07/20 11:26:16 [ERR] agent: Coordinate update error: No cluster leader

the cluster doesnt work correctly.

gabrielfsousa commented 6 years ago

its because one of the consul replicas must boot with -bootstrap option. since is a single file statefulset, add the option -bootstrap-expect=3

if you are using 3 replicas to consul, change to the number of replicas you are using

karthikeayan commented 5 years ago

Getting the same error:

2018/11/15 10:29:08 [INFO] agent: Discovered LAN servers:
2018/11/15 10:29:08 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s
2018/11/15 10:29:15 [WARN] raft: no known peers, aborting election
2018/11/15 10:29:15 [ERR] agent: failed to sync remote state: No cluster leader
2018/11/15 10:29:23 [ERR] http: Request GET /v1/kv/config/gateway-prod/?recurse&token=<hidden>, error: No cluster leader from=10.233.68.72:32798

==> Newer Consul version available: 1.4.0 (currently running: 1.4.0)

micksear commented 5 years ago

I also have this error. I have everything running in a namespace. Would that affect the label-based discovery, perhaps? I can see pods are running if I select with labels:

kubectl -n consul get po -l app=consul,component=server
NAME       READY   STATUS    RESTARTS   AGE
consul-0   1/1     Running   0          6m
consul-1   1/1     Running   0          7m
consul-2   1/1     Running   0          7m

I've updated to 1.4.2 of consul, and I'm running on GKE: 1.11.6-gke.3

My consul logs indicate no discovered servers:

2019/02/01 16:58:45 [ERR] agent: Coordinate update error: No cluster leader
2019/02/01 16:58:48 [ERR] agent: failed to sync remote state: No cluster leader
2019/02/01 16:58:49 [INFO] agent: Discovered LAN servers:
2019/02/01 16:58:49 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s

I'm not sure what to check at this point. I have the -bootstrap-expect=3 enabled, but I wouldn't expect that to trigger anything if no other servers can be discovered...

goughlee commented 5 years ago

I had the same error with a docker hosted consul cluster (not on kubernetes though) and it turned out all of my instances had auto generated the same node ids. As soon as I manually set the node id differently on each instance (using -node-id argument) all was fine. Perhaps something to try.

e100 commented 5 years ago

@micksear

I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]
itsecforu commented 4 years ago

Got the same error with bootstrap-expect=3 in my consul.yaml All pods into the same namespaces.

itsecforu commented 4 years ago

Did somebody solve it?

Batirchik commented 4 years ago

Bumped into this issue today. The issue is caused by Affinity Settings. By default, there are 3 replicas and if you have less than 3 nodes (e.g. 2), one pod won't come up and you will get the mentioned error. Thus, make sure that you have the corresponding number of node.

gkannan66235 commented 4 years ago

Error from consul: 2020-05-29T04:19:22.499Z [INFO] agent: Joining cluster...: cluster=LAN 2020-05-29T04:19:22.499Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-sever.n1.svc, consul-server-1.consul-server.n1.svc, consul-server-2.consul-server.n1.svc[] 2020-05-29T04:19:22.543Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.n1.svc: lookup consul-server-1.consul-server.n1.svc on 10.0.0.10:53: no such host 2020-05-29T04:25:01.506Z [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-05-29T04:25:06.768Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ 2020-05-29T04:25:29.271Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ but all consul pod's in running status & if we run consul join manually its working.

NAME READY STATUS RESTARTS AGE consul-server-0 1/2 Running 0 13m consul-server-1 1/2 Running 0 13m consul-server-2 1/2 Running 0 13m

gupf0719 commented 3 years ago

This bug has not been resolved in the current version 1.9.1

deeco commented 3 years ago

@micksear

I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]

This resolved my issue for a cluster deployed into consul namespace, updated the server json in the configmap manifest to include below as per @e100

"retry_join": [
    "provider=k8s namespace=consul label_selector=\"app=consul,component=server\""
 ]
Carmezim commented 2 years ago

I've seen this issue occurring for multiple people several times.

If on k8s besides setting -bootstrap-expect to the number of servers you're running (e.g. 3-5 pods), deleting all PVCs and volumes after uninstalling consul completely was the only solution that worked for me.

It didn't matter what was done and re/uninstalls (helm based) Consul would be unable to properly bootstrap and elect a leader until not only all components were removed from the (k8s) cluster but the PVCs and volumes.

This note should be in the k8s section btw.

cc @gupf0719