kubernetes-retired / contrib

[EOL] This is a place for various components in the Kubernetes ecosystem that aren't part of the Kubernetes core.
Apache License 2.0
2.46k stars 1.68k forks source link

How is this possible? #2930

Closed bojanv55 closed 5 years ago

bojanv55 commented 6 years ago

I installed curl in the elector image. Running 3 replicas.

# curl -sSk -H "Authorization: Bearer $TOKEN" \
      https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/default/endpoints/example>
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "example",
    "namespace": "default",
    "selfLink": "/api/v1/namespaces/default/endpoints/example",
    "uid": "7dfccbfa-85b6-11e8-b2ab-08002702cbca",
    "resourceVersion": "3202186",
    "creationTimestamp": "2018-07-12T09:32:27Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"aspectcaffleader-84d5769dbd-wq9cf\",\"leaseDurationSeconds\":10,\"acquireTime\":\"2018-07-12T09:34:59Z\",\"renewTime\":\"2018-07-12T10:13:39Z\",\"leaderTransitions\":0}"
    }
  }
}# curl localhost:4040
{"name":"aspectcaffleader-59df7bb4fc-8slv8"}#

Master that is in the endpoint is not the same as the one reported by elector locally?

Update is done OK when master is deleted and new MASTER has OK name, but name of the master returned by STANDBY nodes is not correct.

trajakovic commented 6 years ago

can confirm @bojanv55 issue:

kubectl get pods|grep leader-ele                                                                                                                     
leader-elector-78dc57584b-6sltq                               1/1       Running   0          11m
leader-elector-78dc57584b-qc8z4                               1/1       Running   0          4m
leader-elector-78dc57584b-xrdpb                               1/1       Running   0          13m
kubectl get endpoints example -o yaml                                                                                                                
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"leader-elector-78dc57584b-6sltq","leaseDurationSeconds":10,"acquireTime":"2018-07-28T13:37:38Z","renewTime":"2018-07-28T13:40:46Z","leaderTransitions":0}'
  creationTimestamp: 2018-07-28T13:28:11Z
  name: example
  namespace: staging
  resourceVersion: "18868468"
  selfLink: /api/v1/namespaces/staging/endpoints/example
  uid: 12d99a29-926a-11e8-8f24-0a96b526afb2
subsets: []

While curling other instances (than real leader):

curl http://localhost:8001/api/v1/proxy/namespaces/staging/pods/leader-elector-78dc57584b-qc8z4:4040/                                               
{"name":"leader-elector-78dc57584b-z68dz"}%

If I ask real leader:

curl http://localhost:8001/api/v1/proxy/namespaces/staging/pods/leader-elector-78dc57584b-6sltq:4040/                                              
{"name":"leader-elector-78dc57584b-6sltq"}

He's the only one who's aware of himself, while others have wrong/dead/killed pod/leader name


P.S. This demo was started with:

kubectl run leader-elector --image=k8s.gcr.io/leader-elector:0.5 --replicas=3 --serviceaccount=leader-elector-example -- --election=example --election-namespace=staging --http=0.0.0.0:4040

while serviceaccount was bound to:

kubectl get role leader-elector -o yaml                                                                                                            
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: 2018-07-28T13:03:56Z
  name: leader-elector
  namespace: staging
  resourceVersion: "18860555"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/staging/roles/leader-elector
  uid: af4e73b7-9266-11e8-8f24-0a96b526afb2
rules:
- apiGroups:
  - ""
  resources:
  - endpoints
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete

on k8s cluster:

kubectl version                                                                                                                                      
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
anushjay commented 6 years ago

Hi. Do guys have a workaround/fix for the issue? I believe we are running into the same issue here -->

On one of the replicas (that is not the master) xxx-5785cf44db-49xtc, I notice in the elector container logs (within our pod which runs another service), the switchover to the new master is as expected- I0802 17:48:39.524226 8 leaderelection.go:259] lock is held by xxx-68b7fb4974-zmvjz and has not yet expired I0802 17:48:44.852689 8 leaderelection.go:259] lock is held by xxx-5785cf44db-j6fmq and has not yet expired

But localhost:4040 endpoint still shows the killed pod as the master - {"name": "xxx-68b7fb4974-zmvjz"}

The :4040 endpoint on master though has the correct information on who is the leader.

fredrik-jansson-se commented 6 years ago

I also ran into this and reported this: https://github.com/kubernetes/contrib/issues/2933

I'll try to mark mine as a duplicate.

fredrik-jansson-se commented 6 years ago

I pull the latest contrib code, and rebuilt the container manually... cannot reproduce anymore, so my guess is that this has been fixed, but no new container pushed.

Available here: https://hub.docker.com/r/fredrikjanssonse/leader-elector/tags/

ghost commented 5 years ago

The tutorial uses the gcr.io/google_containers/leader-elector:0.4 docker image which may be outdated. I had the same error but then I switched to fredrikjanssonse/leader-elector:0.6 and the HTTP responses are correct. Thanks @fredrik-jansson-se!

u05rav commented 5 years ago

This is still an issue with the gcr image. fredrik's image works fine. Any chance of getting a new official image?

piushs commented 5 years ago

@brendandburns @mikedanese Can one of you please make this image from @fredrik-jansson-se official? The current image gcr.io/google_containers/leader-elector:0.5 has problems, where it shows the old leader, when we send http requests to know the leader to pods that are not leader (as described in this issue).

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

ghost commented 5 years ago

/remove-lifecycle stale

The problem is still here, as this thread points out. Can you please solve this at the root, publishing a fixed docker image?

This will unblock many people that ended up either using a "non official" docker image (thanks @fredrik-jansson-se of course!) or building their own from your repo (my case).