Open dukelyuu opened 6 years ago
its because one of the consul replicas must boot with -bootstrap option. since is a single file statefulset, add the option -bootstrap-expect=3
if you are using 3 replicas to consul, change to the number of replicas you are using
Getting the same error:
2018/11/15 10:29:08 [INFO] agent: Discovered LAN servers: 2018/11/15 10:29:08 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s 2018/11/15 10:29:15 [WARN] raft: no known peers, aborting election 2018/11/15 10:29:15 [ERR] agent: failed to sync remote state: No cluster leader 2018/11/15 10:29:23 [ERR] http: Request GET /v1/kv/config/gateway-prod/?recurse&token=<hidden>, error: No cluster leader from=10.233.68.72:32798
==> Newer Consul version available: 1.4.0 (currently running: 1.4.0)
I also have this error. I have everything running in a namespace. Would that affect the label-based discovery, perhaps? I can see pods are running if I select with labels:
kubectl -n consul get po -l app=consul,component=server
NAME READY STATUS RESTARTS AGE
consul-0 1/1 Running 0 6m
consul-1 1/1 Running 0 7m
consul-2 1/1 Running 0 7m
I've updated to 1.4.2 of consul, and I'm running on GKE: 1.11.6-gke.3
My consul logs indicate no discovered servers:
2019/02/01 16:58:45 [ERR] agent: Coordinate update error: No cluster leader
2019/02/01 16:58:48 [ERR] agent: failed to sync remote state: No cluster leader
2019/02/01 16:58:49 [INFO] agent: Discovered LAN servers:
2019/02/01 16:58:49 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s
I'm not sure what to check at this point. I have the -bootstrap-expect=3
enabled, but I wouldn't expect that to trigger anything if no other servers can be discovered...
I had the same error with a docker hosted consul cluster (not on kubernetes though) and it turned out all of my instances had auto generated the same node ids. As soon as I manually set the node id differently on each instance (using -node-id argument) all was fine. Perhaps something to try.
@micksear
I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:
"retry_join": [
"provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
]
Got the same error with bootstrap-expect=3
in my consul.yaml
All pods into the same namespaces.
Did somebody solve it?
Bumped into this issue today. The issue is caused by Affinity Settings. By default, there are 3 replicas and if you have less than 3 nodes (e.g. 2), one pod won't come up and you will get the mentioned error. Thus, make sure that you have the corresponding number of node.
Error from consul:
2020-05-29T04:19:22.499Z [INFO] agent: Joining cluster...: cluster=LAN 2020-05-29T04:19:22.499Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-sever.n1.svc, consul-server-1.consul-server.n1.svc, consul-server-2.consul-server.n1.svc[] 2020-05-29T04:19:22.543Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.n1.svc: lookup consul-server-1.consul-server.n1.svc on 10.0.0.10:53: no such host 2020-05-29T04:25:01.506Z [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-05-29T04:25:06.768Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ 2020-05-29T04:25:29.271Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │
but all consul pod's in running status & if we run consul join manually its working.
NAME READY STATUS RESTARTS AGE consul-server-0 1/2 Running 0 13m consul-server-1 1/2 Running 0 13m consul-server-2 1/2 Running 0 13m
This bug has not been resolved in the current version 1.9.1
@micksear
I had same issue when running in a different namespace with Consul 1.5.1 Editing server.json fixed it:
"retry_join": [ "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\"" ]
This resolved my issue for a cluster deployed into consul namespace, updated the server json in the configmap manifest to include below as per @e100
"retry_join": [
"provider=k8s namespace=consul label_selector=\"app=consul,component=server\""
]
I've seen this issue occurring for multiple people several times.
If on k8s besides setting -bootstrap-expect
to the number of servers you're running (e.g. 3-5 pods), deleting all PVCs and volumes after uninstalling consul completely was the only solution that worked for me.
It didn't matter what was done and re/uninstalls (helm based) Consul would be unable to properly bootstrap and elect a leader until not only all components were removed from the (k8s) cluster but the PVCs and volumes.
This note should be in the k8s section btw.
cc @gupf0719
i have deploy the consul latest version on kubernetes V1.10.0 .but the consul pod's log show these error message: 2018/07/20 11:26:11 [WARN] agent: Check "service:ribbon-consumer" HTTP request failed: Get http://DESKTOP-MCQSJ49:8504/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2018/07/20 11:26:15 [ERR] agent: failed to sync remote state: No cluster leader 2018/07/20 11:26:16 [ERR] agent: Coordinate update error: No cluster leader
the cluster doesnt work correctly.