Closed jbeda closed 7 years ago
/cc @lukemarsden @luxas @mikedanese
If we revert #43474 completely, we are in a situation again where we break 0.2.0 CNI plugins (see https://github.com/kubernetes/kubernetes/issues/43014)
Should we consider doing something like https://github.com/kubernetes/kubernetes/pull/43284?
Also /cc @thockin
/cc @kubernetes/sig-network-bugs
@jbeda can I get some kubelet logs with --loglevel=5?
@yujuhong -- you mention that you think that this is working as intended. Regardless, kubeadm was depending on this behavior. We introduced a breaking change with #43474. We can talk about the right way to fix this for 1.7 but, for now, we need to get kubeadm working again.
Slack discussion ongoing now -- https://kubernetes.slack.com/archives/C09QYUH5W/p1490803144368246
It looks like DaemonSets will still get scheduled even if the node is not ready. This is really, in this case, kubeadm
being a little too paranoid.
The current plan that we are going to test out is to have kubeadm
no longer wait for the master node to be ready but instead just have it be registered. This should be good enough to let a CNI DaemonSet be scheduled to set up CNI.
@kensimon is testing this out.
@jbeda yeah, looks like the DaemonSet controller will still enqueue them mainly because it's completely ignorant of network-iness. We should really fix this more generally. Is there anything immediate to do in kube or is it all in kubeadm for now?
I'm trying to install kubernetes with kubeadm on Ubuntu 16.04. Is there a quick fix for this?
@jbeda if you have a patched version happy to test it..
I have kubeadm getting past the node's NotReady
status, but the dummy deployment it creates isn't working due to the node.alpha.kubernetes.io/notReady
taint preventing it from running. Adding tolerations doesn't seem to help, I'm not exactly sure how to proceed at this point. Can anybody shed some light on how to deploy a pod that tolerates the notReady
taint?
I'm exploring some other options like not marking the node as notReady, but it's not clear that's what we want to do.
we worked around it by removing KUBELET_NETWORK_ARGS from kubelet command line. after that kubeadm init worked fine and we were able to install canal cni plugin.
@sbezverk would you please describe how to do that?
Can confirm @sbezverk (good find :) ) findings, adjusting /etc/systemd/system/10-kubeadm.conf and removing KUBELET_NETWORK_ARGS will make it run on centos. Tested with weave.
@overip you need to edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS
remove $KUBELET_NETWORK_ARGS
and then restart kubelet after that kubeadm init should work.
this is what i did
kubeadm reset
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload systemctl restart kubelet.service
kubeadm init
All correct, and while we're at it
If you see this: kubelet: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
you have to edit your /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and add the flag --cgroup-driver="systemd"
and do as above
kuebadm reset systemctl daemon-reload systemctl restart kubelet.service kubeadm init.
I'd be careful removing --network-plugin=cni
from the kubelet CLI flags, this causes kubelet to use the no_op plugin by default... I would be surprised if common plugins like calico/weave would even work in this case (but then again my understanding of how these plugins operate underneath is a bit limited.)
@kensimon hm, have not seen any issues on my setup, I deployed canal cni plugin and it worked fine..
@sbezverk Is cross host networking also working well?
@resouer cannot confirm, I have 1.6.0 only as All-In-One.
@resouer @sbezverk I succesfully joined a machine.
[root@deploy-01 x86_64]# kubectl get nodes
NAME STATUS AGE VERSION
deploy-01 Ready 51m v1.6.0
master-01 Ready 4m v1.6.0
NAME READY STATUS RESTARTS AGE
etcd-deploy-01 1/1 Running 0 50m
kube-apiserver-deploy-01 1/1 Running 0 51m
kube-controller-manager-deploy-01 1/1 Running 0 50m
kube-dns-3913472980-6plgh 3/3 Running 0 51m
kube-proxy-mbvdh 1/1 Running 0 4m
kube-proxy-rmp36 1/1 Running 0 51m
kube-scheduler-deploy-01 1/1 Running 0 50m
kubernetes-dashboard-2396447444-fm8cz 1/1 Running 0 24m
weave-net-3t487 2/2 Running 0 44m
weave-net-hhcqp 2/2 Running 0 4m
workaround works but can't get flannel going...
@stevenbower worst case scenario, you can put back this setting and restart kubelet when you are done with kubeadm business..
I got a three node cluster with weave
working. Not sure how stable this might be with this hack, but thanks anyway! :smiley:
On a side node, you can put back the $KUBELET_NETWORK_ARGS, after the init on the master passes. I actually did not remove it on the machine I joined, only the cgroup-driver, otherwise kubelet and docker won't work together.
But you don't have to kubeadm reset, just change /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and do the systemctl dance:
systemctl daemon-reload systemctl restart kubelet.service
For me, I have 2 clusters up: one where I applied the patch from https://github.com/kubernetes/kubernetes/pull/43824 and let kubeadm proceed normally on initialization, and one with KUBELET_NETWORK_ARGS deleted. On the cluster with KUBELET_NETWORK_ARGS removed, any traffic between pods fails.
On a cluster with KUBELET_NETWORK_ARGS removed:
$ kubectl run nginx --image=nginx
deployment "nginx" created
$ kubectl expose deployment nginx --port 80
service "nginx" exposed
$ kubectl run --rm -i -t ephemeral --image=busybox -- /bin/sh -l
If you don't see a command prompt, try pressing enter.
/ # wget nginx.default.svc.cluster.local
wget: bad address 'nginx.default.svc.cluster.local'
On a cluster with normal KUBELET_NETWORK_ARGS but with a patched kubeadm:
$ kubectl run nginx --image=nginx
deployment "nginx" created
$ kubectl expose deployment nginx --port 80
service "nginx" exposed
$ kubectl run --rm -i -t ephemeral --image=busybox -- /bin/sh -l
If you don't see a command prompt, try pressing enter.
/ # wget nginx.default.svc.cluster.local
Connecting to nginx.default.svc.cluster.local (10.109.159.41:80)
index.html 100% |********************************************************************************************| 612 0:00:00 ETA
If you're one of those who disabled KUBELET_NETWORK_ARGS, check if the above works for you.
I suggest that we drop both the node ready and the dummy deployment check altogether for 1.6 and move them to a validation phase for 1.7.
On Mar 29, 2017 10:13 AM, "Dan Williams" notifications@github.com wrote:
@jbeda https://github.com/jbeda yeah, looks like the DaemonSet controller will still enqueue them mainly because it's completely ignorant of network-iness. We should really fix this more generally. Is there anything immediate to do in kube or is it all in kubeadm for now?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290158416, or mute the thread https://github.com/notifications/unsubscribe-auth/ABtFIUw8GIJVfHrecB3qpTLT8Q4AyLVjks5rqpFKgaJpZM4MtMRe .
Anyone else running Ubuntu 16.04? I've removed the KUBELET_NETWORK_ARGS
from the systemd
service and reloaded the systemd
daemon. I can turnup a master node but cannot join a node. It fails with the error The requested resource is unavailable
KUBELET_NETWORK_ARGS removed, any traffic between pods fails.
I wouldn't be surprised since KUBELET_NETWORK_ARGS tells kubelet what plugin to use, where to look for the configuration and binaries. You NEED them.
I recommend #43835 (and the 1.6 cherry pick #43837) as the fix we make for 1.6. I tested both and they work. I've assigned @jbeda and @luxas to review when they wake up.
Both of those PRs look reasonable. ButI think we should look at going with https://github.com/kubernetes/kubernetes/pull/43824 instead. While it is a bit more complicated, it does preserve that code path so that users that preconfigure CNI outside of using a Daemonset (I do this in https://github.com/jbeda/kubeadm-gce-tf though I haven't updated it to 1.6) still wait for nodes to be ready.
As a bonus, this is @kensimon's first PR to Kubernetes and he has pulled out the stops to test this stuff out. But, to be honest, they are both workable and I really want to see it fixed. :)
Sorry I missed https://github.com/kubernetes/kubernetes/pull/43824. I'm also happy with either if they both work.
I'm also happy with either if they both work too
@kensimon It works for me, when I only disable KUBELET_NETWORK_ARGS
during kubadm init
. Thanks to you instructions I could verify that.
Confirmed @webwurst for it working when you only disable KUBELET_NETWORK_ARGS
during kubadm init
. I had to restart kube-dns though for it to pick it up. The check from @kensimon works, dns resolves.
Although I agree that this a terrible hack, and too horrible for most people to follow, looking at the slack channels.
A better solution is presented by the patches from @kensimon or @mikedanese.
@coeki how exactly did you restart kube-dns. I tried kubectl delete pod kube-dns-3913472980-l771x --namespace=kube-system
and now kube-dns stays on pending kube-system kube-dns-3913472980-7jrwm 0/3 Pending 0 4m
It did exactly as described: remove KUBELET_NETWORK_ARGS
, sudo systemctl daemon-reload && sudo systemctl restart kubelet.service
, kubeadm init
, add KUBELET_NETWORK_ARGS
, again sudo systemctl daemon-reload && sudo systemctl restart kubelet.service
But then my master stays in NotReady
. In describe
I get
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
...
KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I tried the kube-dns restart as described above, but no success. Any idea? I'm dying on this, trying to get our cluster running again after a failed 1.6.0 upgrade this morging :(
@patte So I just delete pods kube-dns-3913472980-3ljfx -n kube-system
And then kube-dns comes up again.
Did you after kubeadm init, add KUBELET_NETWORK_ARGS
, again sudo systemctl daemon-reload && sudo systemctl restart kubelet.service
install a pod network, like weave or calico? Add that first, you should be able to get it to work.
I tried and tested on centos7 and just did it on ubuntu/xenial, so should work.
To recap what I did:
remove KUBELET_NETWORK_ARGS
sudo systemctl daemon-reload && sudo systemctl restart kubelet.service
kubeadm init --token=$TOKEN --apiserver-advertise-address=$(your apiserver ip address)
add KUBELET_NETWORK_ARGS
again
sudo systemctl daemon-reload && sudo systemctl restart kubelet.service
kubectl apply -f https://git.io/weave-kube-1.6
Joined a machine, typically have to add a static route to 10.96.0.0 (cluster ip) as I am on vagrant. Use with the $TOKEN, but this step is extra.
Then:
delete pods kube-dns-3913472980-3ljfx -n kube-system
Wait for it
kubectl run nginx --image=nginx
kubectl expose deployment nginx --port 80
kubectl run --rm -i -t ephemeral --image=busybox -- /bin/sh -l
/ # wget nginx.default.svc.cluster.local Connecting to nginx.default.svc.cluster.local (10.101.169.36:80) index.html 100% |***********************************************************************| 612 0:00:00 ETA
Works for me, although it's a horrible hack ;)
I am really surprised that kubernetes development community has not provided any ETA for an official fix. I mean this is a horrible bug which should be easily get caught during the code testing. Since it has not, at the very least, 1.6.1 should be pushed asap with the fix so people would stop hacking their clusters and start doing productive things ;). Am I wrong here?
Hey all,
I was a little distracted this week, and this is a long thread full of kubeadm stuff I don't know that well. Can someone summarize for me? I think I get the gist of the bug, but what are the proposed solutions and what makes them horrible?
On Thu, Mar 30, 2017 at 8:13 AM, Serguei Bezverkhi <notifications@github.com
wrote:
I am really surprised that kubernetes development community has not provided any ETA for an official fix. I mean this is a horrible bug which should be easily get caught during the code testing. Since it has not, at the very least, 1.6.1 should be pushed asap with the fix so people would stop hacking their clusters and start doing productive things ;). Am I wrong here?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290442315, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVEoKuUf28VazmUApsnyfGhuAhZIqks5rq8aLgaJpZM4MtMRe .
A change in kubelet (#43474) caused kubelet to start correctly reporting network not ready before the cni plugin was initialized. This broke some ordering that we were depending on and caused a deadlock in kubeadm master initialization. We didn't catch it because kubeadm e2e tests had been broken for a few days leading up to this change.
Current proposed fixes are #43824 and #43835.
ok, That was what I understood. The interlock between network plugin coming up and node readiness is a little awful right now.
On Thu, Mar 30, 2017 at 8:28 AM, Mike Danese notifications@github.com wrote:
A change in kubelet (#43474 https://github.com/kubernetes/kubernetes/pull/43474) caused kubelet to start correctly reporting network not ready before the cni plugin was initialized. This broke some ordering that we were depending on and caused a deadlock in kubeadm master initialization. We didn't catch it because kubeadm e2e tests had been broken for a few days leading up to this change.
Current proposed fixes are #43824 https://github.com/kubernetes/kubernetes/pull/43824 and #43835 https://github.com/kubernetes/kubernetes/pull/43835.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290447309, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVLzQLxeV6qp1Rw1fNALDDaf-Sktyks5rq8o2gaJpZM4MtMRe .
I still prefer #43835. It's a simpler change, I don't think the e2e checks should be done where they are, and there are reports of #43824 not working still. I'm going to push to get this resolved today.
+1 to resolve it today as lots of efforts are wasted on dealing with collateral from the workaround.
I can't believe nobody really tried kubeadm 1.6.0 before 1.6.0 was released....
And, kubelet 1.5.6 + kubeadm 1.5.6 are also broken, /etc/systemd/system/kubelet.service.d/10-kubeadm.conf references /etc/kubernetes/pki/ca.crt but kubeadm doesn't generate ca.crt, there is ca.pem although.
Currently 1.6.0 and 1.5.6 are the only left releases in k8s apt repository...
"broken out of the box", words learned today.
I still prefer #43835. It's a simpler change, I don't think the e2e checks should be done where they are, and there are reports of #43824 not working still. I'm going to push to get this resolved today.
Agree with Mike on this one. #43835 is the simpler change, and validation (if needed) can be done in a separate phase.
@thockin we really need finer-grained conditions and status for networking, especially with hostNetwork:true. Nodes can be ready for some pods, but not others.
We can't use nodeNetworkUnavailable, becuase that's specific to cloud providers. We probably need another one, or a way for the scheduler to allow hostNetwork:true pods on nodes with NetworkReady:false, or making the taints work for unready nodes. And working kubeadm e2e tests :(
Agree. I have been delaying the problem because I had no great answers, but we need to get this in 1.7
On Thu, Mar 30, 2017 at 10:02 AM, Dan Williams notifications@github.com wrote:
@thockin https://github.com/thockin we really need finer-grained conditions and status for networking, especially with hostNetwork:true. Nodes can be ready for some pods, but not others.
We can't use nodeNetworkUnavailable, becuase that's specific to cloud providers. We probably need another one, or a way for the scheduler to allow hostNetwork:true pods on nodes with NetworkReady:false, or making the taints work for unready nodes.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290475480, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVAaO9c76_R8me9PDo96AQ1ZrplMxks5rq-A1gaJpZM4MtMRe .
Initial report in https://github.com/kubernetes/kubeadm/issues/212.
I suspect that this was introduced in https://github.com/kubernetes/kubernetes/pull/43474.
What is going on (all on single master):
In the conditions list for the node:
Previous behavior was for the kubelet to join the cluster even with unconfigured CNI. The user will then typically run a DaemonSet with host networking to bootstrap CNI on all nodes. The fact that the node never joins means that, fundamentally, DaemonSets cannot be used to bootstrap CNI.
Edit by @mikedanese: please test patched debian amd64 kubeadm https://github.com/kubernetes/kubernetes/issues/43815#issuecomment-290616036 with fix