Closed bojanv55 closed 5 years ago
can confirm @bojanv55 issue:
kubectl get pods|grep leader-ele
leader-elector-78dc57584b-6sltq 1/1 Running 0 11m
leader-elector-78dc57584b-qc8z4 1/1 Running 0 4m
leader-elector-78dc57584b-xrdpb 1/1 Running 0 13m
kubectl get endpoints example -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"leader-elector-78dc57584b-6sltq","leaseDurationSeconds":10,"acquireTime":"2018-07-28T13:37:38Z","renewTime":"2018-07-28T13:40:46Z","leaderTransitions":0}'
creationTimestamp: 2018-07-28T13:28:11Z
name: example
namespace: staging
resourceVersion: "18868468"
selfLink: /api/v1/namespaces/staging/endpoints/example
uid: 12d99a29-926a-11e8-8f24-0a96b526afb2
subsets: []
While curling other instances (than real leader):
curl http://localhost:8001/api/v1/proxy/namespaces/staging/pods/leader-elector-78dc57584b-qc8z4:4040/
{"name":"leader-elector-78dc57584b-z68dz"}%
If I ask real leader:
curl http://localhost:8001/api/v1/proxy/namespaces/staging/pods/leader-elector-78dc57584b-6sltq:4040/
{"name":"leader-elector-78dc57584b-6sltq"}
He's the only one who's aware of himself, while others have wrong/dead/killed pod/leader name
P.S. This demo was started with:
kubectl run leader-elector --image=k8s.gcr.io/leader-elector:0.5 --replicas=3 --serviceaccount=leader-elector-example -- --election=example --election-namespace=staging --http=0.0.0.0:4040
while serviceaccount was bound to:
kubectl get role leader-elector -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: 2018-07-28T13:03:56Z
name: leader-elector
namespace: staging
resourceVersion: "18860555"
selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/staging/roles/leader-elector
uid: af4e73b7-9266-11e8-8f24-0a96b526afb2
rules:
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
on k8s cluster:
kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Hi. Do guys have a workaround/fix for the issue? I believe we are running into the same issue here -->
On one of the replicas (that is not the master) xxx-5785cf44db-49xtc, I notice in the elector container logs (within our pod which runs another service), the switchover to the new master is as expected- I0802 17:48:39.524226 8 leaderelection.go:259] lock is held by xxx-68b7fb4974-zmvjz and has not yet expired I0802 17:48:44.852689 8 leaderelection.go:259] lock is held by xxx-5785cf44db-j6fmq and has not yet expired
But localhost:4040 endpoint still shows the killed pod as the master - {"name": "xxx-68b7fb4974-zmvjz"}
The :4040 endpoint on master though has the correct information on who is the leader.
I also ran into this and reported this: https://github.com/kubernetes/contrib/issues/2933
I'll try to mark mine as a duplicate.
I pull the latest contrib code, and rebuilt the container manually... cannot reproduce anymore, so my guess is that this has been fixed, but no new container pushed.
Available here: https://hub.docker.com/r/fredrikjanssonse/leader-elector/tags/
The tutorial uses the gcr.io/google_containers/leader-elector:0.4
docker image which may be outdated.
I had the same error but then I switched to fredrikjanssonse/leader-elector:0.6
and the HTTP responses are correct.
Thanks @fredrik-jansson-se!
This is still an issue with the gcr image. fredrik's image works fine. Any chance of getting a new official image?
@brendandburns @mikedanese Can one of you please make this image from @fredrik-jansson-se official? The current image gcr.io/google_containers/leader-elector:0.5 has problems, where it shows the old leader, when we send http requests to know the leader to pods that are not leader (as described in this issue).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
The problem is still here, as this thread points out. Can you please solve this at the root, publishing a fixed docker image?
This will unblock many people that ended up either using a "non official" docker image (thanks @fredrik-jansson-se of course!) or building their own from your repo (my case).
I installed curl in the elector image. Running 3 replicas.
Master that is in the endpoint is not the same as the one reported by elector locally?
Update is done OK when master is deleted and new MASTER has OK name, but name of the master returned by STANDBY nodes is not correct.