kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
311 stars 43 forks source link

fresh deploy failing due to taints/tolerations #568

Closed mrhillsman closed 5 years ago

mrhillsman commented 6 years ago

I am using the latest version available via releases v1.3.0-rc.3 and the CoreDNS pods are not coming online due to taints:

http://paste.mrhillsman.com/paste/2w4IIuGPmTw4p97HQeqe1w

root@master:~# kubectl describe pod/coredns-dcb4c7ddd-28zvd -n kube-system
...
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

root@master:~# kubectl describe pod/coredns-dcb4c7ddd-2pvc2 -n kube-system
...
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

root@master:~# kubectl describe node/master -n kube-system
...
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

root@master:~# kubectl describe node/node00 -n kube-system
...
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

root@master:~# kubectl describe node/node01 -n kube-system
...
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

I am sure this was an issue with the latest 1.2 branch as well but have not confirmed since I installed v1.3.0-rc.3 just before deploying again.

mrhillsman commented 6 years ago

Running the following resulted in the pods running:

root@master:~# kubectl -n kube-system taint nodes node00 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-
root@master:~# kubectl -n kube-system taint nodes node01 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule-

I waited a significant amount of time before running the above commands; +2hrs

jakolehm commented 6 years ago

@mrhillsman by default dns is deployed to non-tainted nodes (workers). Did you have any worker nodes?

mrhillsman commented 6 years ago

@jakolehm I do have worker nodes; node00 and node01. Here is my cluster.yml

hosts:
  - address: "10.145.81.191"
    private_interface: ens3
    user: ubuntu
    ssh_key_path: ~/.ssh/id_rsa
    role: master
  - address: "10.145.81.192"
    private_interface: ens3
    user: ubuntu
    ssh_key_path: ~/.ssh/id_rsa
    role: worker
  - address: "10.145.81.193"
    private_interface: ens3
    user: ubuntu
    ssh_key_path: ~/.ssh/id_rsa
    role: worker
# network:
#  provider: weave
#  weave:
#    trusted_subnets:
#      - "172.31.0.0/16"
network:
  provider: calico
  service_cidr: 172.31.0.0/16
  pod_network_cidr: 172.32.0.0/16
cloud:
  provider: external
  config: ./cloud-config
addons:
  kubernetes-dashboard:
    enabled: true
jakolehm commented 6 years ago

Cannot reproduce this. @mrhillsman did you figure out why nodes have node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule taint? (it sounds like cni/network problem)

mrhillsman commented 5 years ago

@jakolehm unfortunately not and had to move on to another deployment tool though I was really liking pharos.

Sent with GitHawk