kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.6k stars 14.49k forks source link

Issue with k8s.io/docs/setup/independent/create-cluster-kubeadm/ Stacked control plane nodes #10450

Closed eturpin closed 6 years ago

eturpin commented 6 years ago

This is a...

Problem: Cannot create HA cluster by following steps provided under "Stacked control plane nodes."

Proposed Solution: I am in the process of learning Kubernetes, don't know what exactly is wrong, and don't know how to fix.

Page to Update: https://kubernetes.io/...

Kubernetes Version: 1.12.0

First attempt: Failed setting up second control plane at this step: kubeadm alpha phase kubelet write-env-file --config kubeadm-config.yaml Output: didn't recognize types with GroupVersionKind: [kubeadm.k8s.io/v1alpha3, Kind=ClusterConfiguration]

As a workaround, I replaced kubeadm.k8s.io/v1alpha3 with kubeadm.k8s.io/v1alpha2 and ClusterConfiguration with MasterConfiguration in the kubeadm-config.yaml

Second attempt: kubeadm alpha phase mark-master --config kubeadm-config.yaml times out. I tried this several times, trying to debug, and noticed that everything seems to break after running kubectl exec -n kube-system etcd-${CP0_HOSTNAME} -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://${CP0_IP}:2379 member add ${CP2_HOSTNAME} https://${CP2_IP}:2380. By doing docker ps on the first control-plane, I notice that new containers keep getting created (I assume they keep crashing).

Third attempt: Tried the proposed solution in #9526. This time there are no errors or timeouts from executing the commands, but the kube controller managers and schedulers on the second and third control planes appear to be crashing:

kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-node-279cq 2/2 Running 0 22m kube-system calico-node-4ggxh 2/2 Running 0 22m kube-system calico-node-pzz6x 2/2 Running 0 22m kube-system coredns-576cbf47c7-jf9v8 1/1 Running 0 48m kube-system coredns-576cbf47c7-t6x8j 1/1 Running 0 48m kube-system etcd-REDACTED 1/1 Running 1 47m kube-system etcd-REDACTED 1/1 Running 0 41m kube-system etcd-REDACTED 1/1 Running 0 25m kube-system kube-apiserver-REDACTED 1/1 Running 0 47m kube-system kube-apiserver-REDACTED 1/1 Running 0 41m kube-system kube-apiserver-REDACTED 1/1 Running 0 25m kube-system kube-controller-manager-REDACTED 1/1 Running 1 47m kube-system kube-controller-manager-REDACTED 0/1 CrashLoopBackOff 13 42m kube-system kube-controller-manager-REDACTED 0/1 CrashLoopBackOff 9 26m kube-system kube-proxy-5z77x 1/1 Running 0 48m kube-system kube-proxy-fljtd 1/1 Running 0 26m kube-system kube-proxy-tlc2s 1/1 Running 0 42m kube-system kube-scheduler-REDACTED 1/1 Running 1 47m kube-system kube-scheduler-REDACTED 0/1 CrashLoopBackOff 13 42m kube-system kube-scheduler-REDACTED 0/1 CrashLoopBackOff 10 26m

Looks like they don't have a configuration?:

kubectl --namespace=kube-system logs kube-controller-manager-REDACTED Flag --address has been deprecated, see --bind-address instead. I1001 18:39:51.847438 1 serving.go:293] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key) invalid configuration: no configuration has been provided

Anyone know any quick fixes for my issues? I'd be happy to provide more information, but I'm not sure where to look.

neolit123 commented 6 years ago

/kind bug

neolit123 commented 6 years ago

@kubernetes/sig-cluster-lifecycle /sig cluster-lifecycle

gclyatt commented 6 years ago

I ran into the same issue using debian 9, docker 17.3.3, and flannel. I looked at these docs to try to see if I could maybe figure out more about the file. I ultimately skipped it, as i wasn't able to see any kubelet config files written to cp0 that weren't also on the other 2 hosts. I was able to get through the rest of it with an almost working cluster. control plane and etcd were working. cp0 ran the local manifests, but stayed in "NotReady" state due to cni config not being present.

Tried again on coreos 1855.4.0 and had similar experience. Seems like a circular dependency on first host and CNI plugin. It works if you add the file /etc/cni/net.d/10-flannel.conflist to cp0 manually.

ztec commented 6 years ago

I had the same issue :

My context :

As workaround I did the step (aka kubeadm alpha phase kubelet write-env-file --config kubeadm-config.yaml) manually by getting the file generated on cp0. Copy the file /var/lib/kubelet/kubeadm-flags.env from cp0 to cp1 at the same location instead of doing the step.

For the record, my file looks like this:

/var/lib/kubelet/kubeadm-flags.env

KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni --resolv-conf=/run/systemd/resolve/resolv.conf
detiber commented 6 years ago

@eturpin @ztec @gclyatt: Just to clarify, you are seeing this when using v1.12.0 of kubeadm?

ztec commented 6 years ago

@detiber for me, yes.

eturpin commented 6 years ago

@detiber Yes. Using the package from http://apt.kubernetes.io/ kubernetes-xenial main.

os: Ubuntu 16.04.5 arch: amd64 Kubernetes version: 1.12.0

gclyatt commented 6 years ago

@detiber I saw this on v1.12.0 of kubeadm and just tried again with v1.12.1 and had same result with the write-env-file step.

jjgraham commented 6 years ago

I have the same issue in Centos 7.5 v1.12.1 Can I just copy from CP0 to CP1 and move on ? I am getting timeout switching CP1 to master if i do

nardusg commented 6 years ago

Also on Centos 7.5 and v1.12.1, same issue as mentioned above.

Created file from cp0 /var/lib/kubelet/kubeadm-flags.env , and now I can continue.

NAME STATUS ROLES AGE VERSION lt1-k8s-04.blah.co.za Ready master 37m v1.12.1 lt1-k8s-05.blah.co.za NotReady master 5m25s v1.12.1

detiber commented 6 years ago

/assign

selmison commented 6 years ago

I had the same issue :

OS: Ubuntu Ubuntu 16.04.4 LTS Arch: amd64 Kubernetes version: 1.12.1 Issue on second master

jbiel commented 6 years ago

I'm experiencing the same problem with Kubernetes 1.12.1 and Ubuntu 18.04.

pankajpandey9 commented 6 years ago

Facing same issue with CentOS & Kubernetes 1.12.1

Please advise.

jbiel commented 6 years ago

@pankajpandey9, the workaround outlined by @ztec a few posts up works.

billimek commented 6 years ago

For me, the workaround (manually creating the /var/lib/kubelet/kubeadm-flags.env file on the other nodes) allowed the rest of the commands in the documentation to complete.

However, the resulting cluster was still (mostly) broken. kube-controller-manager and kube-scheduler on the other nodes were in a continual crashing loop.

neolit123 commented 6 years ago

closing in favor of: https://github.com/kubernetes/kubeadm/issues/1171

^ has the actual cause + solution defined too.

this is a kubeadm issue and we shouldn't track it in the website repo. /close

k8s-ci-robot commented 6 years ago

@neolit123: Closing this issue.

In response to [this](https://github.com/kubernetes/website/issues/10450#issuecomment-429819827): >closing in favor of: >https://github.com/kubernetes/kubeadm/issues/1171 > >^ has the actual cause + solution defined too. > >this is a kubeadm issue and we shouldn't tracking it in the website repo. >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.