projectcalico / canal

Policy based networking for cloud native applications
717 stars 100 forks source link

Kubernetes master node is 'untainted' when applying Canal CNI #77

Closed RichWellum closed 7 years ago

RichWellum commented 7 years ago

Expected Behavior

I'm running kubernetes on a Centos 7.x VM, the purpose is to create an AIO to run kolla OpenStack images on. Typically, after I apply the canal.yaml CNI, I then have to mark the master node as scheduleable by 'untainting' the node. This allows you to use kubernetes as an AIO - creating OpenStack services on the one node. The command is: kubectl taint nodes --all=true node-role.kubernetes.io/master:NoSchedule-

Current Behavior

In the last few days I see that after applying canal the taint is removed, here's my log:

[rwellum@kolla-k8s k8s]$ kubectl get nodes NAME STATUS AGE VERSION kolla-k8s NotReady 33s v1.6.4

#Taint is there on initial bring-up: [rwellum@kolla-k8s k8s]$ kubectl describe node kolla-k8s | grep -i taint Taints: node-role.kubernetes.io/master:NoSchedule

[rwellum@kolla-k8s k8s]$ # Now I will apply canal.yaml and check the node again: [rwellum@kolla-k8s k8s]$ kubectl describe node kolla-k8s | grep -i taint Taints: [rwellum@kolla-k8s k8s]$ # Weirdly the Taint is gone...

[rwellum@kolla-k8s k8s]$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system canal-jwnp8 3/3 Running 0 28s kube-system etcd-kolla-k8s 1/1 Running 0 1m kube-system kube-apiserver-kolla-k8s 1/1 Running 0 1m kube-system kube-controller-manager-kolla-k8s 1/1 Running 0 1m kube-system kube-dns-3913472980-6lpm5 0/3 Pending 0 1m kube-system kube-proxy-w1kf3 1/1 Running 0 1m kube-system kube-scheduler-kolla-k8s 1/1 Running 0 1m [rwellum@kolla-k8s k8s]$

Possible Solution

Steps to Reproduce (for bugs)

  1. I am following this deployment guide: https://docs.openstack.org/developer/kolla-kubernetes/deployment-guide.html
  2. Before applying canal (kubectl apply -f canal.yaml) - as my log above shows - check the taint
  3. Apply canal and then check the taint again - it should be gone

Context

For me as I am running an AIO - it has no effect other than a warning in my logs that the taint doesn't exist. However for a multi-node deployment or a production deployment this would be an issue.

Your Environment

treacher commented 7 years ago

I'm also seeing this, it seems to be an issue with Calico. I don't have this issue when just running flannel on it's own.

Flannel fixed this with the following: https://github.com/coreos/flannel/issues/667

Edit: Calico on it's own seems to work, seems to just be canal :\

RichWellum commented 7 years ago

Thanks for replying and pointing out the flannel fix. Hoping someone will look at this!

caseydavenport commented 7 years ago

Thank for raising - I think this might be a bug in the Calico kubernetes datastore driver.

@heschlie could you try to repro and determine where the problem is?

heschlie commented 7 years ago

Taking a look

heschlie commented 7 years ago

This looks to be fixed by updating calico/node from v1.2.1 to v1.3.0, I'll need to dig further in to see as to what changed that seemed to resolve this so we can ensure it doesn't break again.

I'll open a PR to update the manifests to the latest versions of Calico, in the mean time can you update your manifest locally and verify it fixes it for you as well? You'll probably want to update calico/cni to v1.9.1 as well while you are there.

caseydavenport commented 7 years ago

Going to close this for now, but please shout if upgrading to the latest manifests does not fix this issue. Thanks!

RichWellum commented 7 years ago

I think I am already using these versions as I download from canal/master:

$ curl -O https://raw.githubusercontent.com/projectcalico/canal/master/k8s-install/1.6/canal.yaml % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 6474 100 6474 0 0 19755 0 --:--:-- --:--:-- --:--:-- 19798 $ cat canal.yaml | grep v1 apiVersion: v1 apiVersion: extensions/v1beta1 image: quay.io/calico/node:v1.3.0 image: quay.io/calico/cni:v1.9.1 apiVersion: v1

I just checked and as you said it appears to be working now. Thanks!

heschlie commented 7 years ago

@RichWellum I merged #88 a few hours ago, so you're now getting the newer versions.

RichWellum commented 7 years ago

@heschlie - many thanks for the speedy work. By the way, I had this issue open for a while, for my education, is there anything I should have done to raise attention to it?

tmjd commented 7 years ago

@RichWellum I think you did the right thing. We simply missed this issue I believe. Always feel free to "bump" an issue if it isn't getting the action you think it should or come bug us in calicousers.slack.com. Actually this may be a gap in our documentation, we could include some notes about submitting issues and the expected process. I just added #89 to address that.

RichWellum commented 7 years ago

Thank you very much. I should have thought about the slack channel - too used to OpenStack and IRC... :)