kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 717 forks source link

coredns pod stuck in pending #1939

Closed dkypuros closed 4 years ago

dkypuros commented 4 years ago

These two pods are stuck in pending: kube-system coredns-5644d7b6d9-jkt62 0/1 Pending 0 4h32m kube-system coredns-5644d7b6d9-kd4kw 0/1 Pending 0 4h32m

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                            READY   STATUS    RESTARTS   AGE
kube-system   coredns-5644d7b6d9-jkt62        0/1     Pending   0          4h32m
kube-system   coredns-5644d7b6d9-kd4kw        0/1     Pending   0          4h32m
kube-system   etcd-01cpn                      1/1     Running   0          4h32m
kube-system   kube-apiserver-01cpn            1/1     Running   0          4h31m
kube-system   kube-controller-manager-01cpn   1/1     Running   0          4h31m
kube-system   kube-flannel-ds-amd64-zmwv4     1/1     Running   0          4h26m
kube-system   kube-proxy-zlqxl                1/1     Running   0          4h32m
kube-system   kube-scheduler-01cpn            1/1     Running   0          4h31m

kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:20:25Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

hardware configuration=libvrt, kvm, fedora virtualization machine manager 2.2.1

NAME=Fedora
VERSION="28 (Server Edition)"
ID=fedora
VERSION_ID=28
VERSION_CODENAME=""
PLATFORM_ID="platform:f28"
PRETTY_NAME="Fedora 28 (Server Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:28"
HOME_URL="https://fedoraproject.org/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=28
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=28
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server

Linux 01cpn 4.16.3-301.fc28.x86_64 #1 SMP Mon Apr 23 21:59:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

What happened?

Following instructions:

  1. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#what-s-next
  2. https://docs.docker.com/install/linux/docker-ce/fedora/
  3. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network
  4. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/

What you expected to happen?

After pod network add-on flannelD was loaded and running, I expected corends pod to move to running state per directions.

How to reproduce it (as minimally and precisely as possible)?

Install Fedora 28 (has Docker ver 18 which is recommended) on libvirt
sudo -i
dnf update -y
swapon -s //check
update-alternatives --set iptables /usr/sbin/iptables-legacy
irewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=10251/tcp
firewall-cmd --permanent --add-port=10252/tcp
firewall-cmd --permanent --add-port=8472/udp
systemctl restart firewalld
dnf config-manager \
      --add-repo \
      https://download.docker.com/linux/fedora/docker-ce.repo
dnf install docker-ce docker-ce-cli containerd.io
mkdir /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet
cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
kubeadm init --pod-network-cidr=10.244.0.0/16
exit //out of root.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
kubectl get pods --all-namespaces

Anything else we need to know?

I moved to Fedora 28 to get the proper version of Docker (18), and to avoid cgroups v2 issues in Fedora 31.

neolit123 commented 4 years ago

After pod network add-on flannelD was loaded and running, I expected corends pod to move to running state per directions.

flannel had bugs (or issues rather) recently. please try Callico or WeaveNet instead from this guide: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

dkypuros commented 4 years ago

Thanks for the tip. I'll try and follow the guide mentioned!

dkypuros commented 4 years ago

Same steps performed as above with these additions for WeaveNet. Notice coredns pods are still stuck in pending status.

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

serviceaccount/weave-net created clusterrole.rbac.authorization.k8s.io/weave-net created clusterrolebinding.rbac.authorization.k8s.io/weave-net created role.rbac.authorization.k8s.io/weave-net created rolebinding.rbac.authorization.k8s.io/weave-net created daemonset.apps/weave-net created

kubectl get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-5644d7b6d9-jrmn9 0/1 Pending 0 4m36s kube-system coredns-5644d7b6d9-p5jjw 0/1 Pending 0 4m36s kube-system etcd-01cpn 1/1 Running 0 3m34s kube-system kube-apiserver-01cpn 1/1 Running 0 3m36s kube-system kube-controller-manager-01cpn 1/1 Running 0 3m54s kube-system kube-proxy-s9nht 1/1 Running 0 4m36s kube-system kube-scheduler-01cpn 1/1 Running 0 3m50s kube-system weave-net-84xr2 2/2 Running 0 54s

dkypuros commented 4 years ago

More errors from kubelet service.

kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Mon 2019-11-25 20:04:58 CST; 5min ago Docs: https://kubernetes.io/docs/ Main PID: 789 (kubelet) Tasks: 17 (limit: 2353) Memory: 131.5M CGroup: /system.slice/kubelet.service └─789 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-cont>

Nov 25 20:10:46 01cpn kubelet[789]: E1125 20:10:46.095519 789 summary_sys_containers.go:47] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": faile> Nov 25 20:10:46 01cpn kubelet[789]: W1125 20:10:46.825130 789 cni.go:202] Error validating CNI config &{weave 0.3.0 false [0xc000303f00 0xc000254200] [123 10 32 32 32 32 34 99 110 105 86 101 114 115 105 111 110 34 58 32 34 48 46 51 4> Nov 25 20:10:46 01cpn kubelet[789]: W1125 20:10:46.825807 789 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d Nov 25 20:10:50 01cpn kubelet[789]: E1125 20:10:50.201634 789 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Nov 25 20:10:51 01cpn kubelet[789]: W1125 20:10:51.863009 789 cni.go:202] Error validating CNI config &{weave 0.3.0 false [0xc0012d1920 0xc0012d19a0] [123 10 32 32 32 32 34 99 110 105 86 101 114 115 105 111 110 34 58 32 34 48 46 51 4> Nov 25 20:10:51 01cpn kubelet[789]: W1125 20:10:51.863112 789 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d Nov 25 20:10:55 01cpn kubelet[789]: E1125 20:10:55.206535 789 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Nov 25 20:10:56 01cpn kubelet[789]: E1125 20:10:56.134730 789 summary_sys_containers.go:47] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": faile> Nov 25 20:10:56 01cpn kubelet[789]: W1125 20:10:56.881415 789 cni.go:202] Error validating CNI config &{weave 0.3.0 false [0xc0005a13e0 0xc0005a1460] [123 10 32 32 32 32 34 99 110 105 86 101 114 115 105 111 110 34 58 32 34 48 46 51 4> Nov 25 20:10:56 01cpn kubelet[789]: W1125 20:10:56.881485 789 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d

dkypuros commented 4 years ago

Maybe I'll try changing container runtime to CRI-O

neolit123 commented 4 years ago

Unable to update cni config: no valid networks found in /etc/cni/net.d

this tells me that the CNI plugin is not working.

have you installed the kubernetes-cni package? currently it is a dependency of the kubeadm package.

you can also download the contents of the package from here and install them manually: https://github.com/containernetworking/plugins/releases (0.8.3 should be fine)

dkypuros commented 4 years ago

I can, but the documentation here (centos/RHEL tab): https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl ..says to install these packages (which I did).

yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

..resulting in this total list of dependencies: kubeadm.x86_64 1.16.3-0
kubectl.x86_64 1.16.3-0
kubelet.x86_64 1.16.3-0
conntrack-tools.x86_64 1.4.4-7.fc28
containernetworking-plugins.x86_64 0.7.4-2.fc28
cri-tools.x86_64 1.13.0-0
libnetfilter_cthelper.x86_64 1.0.0-13.fc28
libnetfilter_cttimeout.x86_64 1.0.0-11.fc28
libnetfilter_queue.x86_64 1.0.2-11.fc28
socat.x86_64 1.7.3.2-6.fc28

neolit123 commented 4 years ago

containernetworking-plugins.x86_64 0.7.4-2.fc28

this is the package that has the plugins and i'm going to assume they are installed. but it's not the k8s distributed package and instead it's the one that comes from your distro.

something is not right with your node's (OS) setup WRT the CNI plugin installation.

try verifying what files this has https://github.com/containernetworking/plugins/releases and what files are installed on your system.

also try another CNI plugin like Calico - we have e2e tests for it and it currently works with 1.16.3.

dkypuros commented 4 years ago

I'll try Calico.

neolit123 commented 4 years ago

what are the contents of your /etc/cni/net.d folder ATM?

kubeadm reset does not clean it.

so if you have multiple plugins in there, try cleaning it up (as long as you are cleaning CNI plugin files only), call kubeadm init ... from scratch again and install the CNI.

the folder should have only one CNI config, otherwise it can cause failures.

Unable to update cni config: no valid networks found in /etc/cni/net.d

this however tells that it has no files at all.

dkypuros commented 4 years ago

Does this mean I need to set the cgroups driver to systemd for fedora? or does it mean since I'm using docker, it will autoset?

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl ... If you are using a different CRI, you have to modify the file /etc/default/kubelet (/etc/sysconfig/kubelet for CentOS, RHEL, Fedora) with your cgroup-driver value, like so: KUBELET_EXTRA_ARGS=--cgroup-driver=systemd ..."

dkypuros commented 4 years ago

what are the contents of your /etc/cni/net.d folder ATM?

kubeadm reset does not clean it.

so if you have multiple plugins in there, try cleaning it up (as long as you are cleaning CNI plugin files only), call kubeadm init ... from scratch again and install the CNI.

the folder should have only one CNI config, otherwise it can cause failures.

Unable to update cni config: no valid networks found in /etc/cni/net.d

this however tells that it has no files at all.

Let me check after I finish with Calico. I start from scratch every time I build my systems.

neolit123 commented 4 years ago

Does this mean I need to set the cgroups driver to systemd for fedora? or does it mean since I'm using docker, it will autoset?

kubeadm auo-passes a flag to the kubelet after it detects the cgroups driver. but only for docker.. systemd is the recommended driver. cgroupfs is the default driver for the kubelet.

dkypuros commented 4 years ago

[root@01cpn ~]# cat /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS=systemd

[marz@01cpn ~]$ mkdir -p $HOME/.kube [marz@01cpn ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [marz@01cpn ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config [marz@01cpn ~]$ sudo -i [root@01cpn ~]# kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused [root@01cpn ~]# kubectl apply -f https://docs.projectcalico.org/v3.10/manifests/calico.yaml unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused [root@01cpn ~]# ping google.com PING google.com (216.58.194.142) 56(84) bytes of data. 64 bytes from dfw06s49-in-f142.1e100.net (216.58.194.142): icmp_seq=1 ttl=51 time=93.6 ms

Installed: kubeadm.x86_64 1.16.3-0 kubectl.x86_64 1.16.3-0 kubelet.x86_64 1.16.3-0 conntrack-tools.x86_64 1.4.4-7.fc28 containernetworking-plugins.x86_64 0.7.4-2.fc28
cri-tools.x86_64 1.13.0-0 libnetfilter_cthelper.x86_64 1.0.0-13.fc28 libnetfilter_cttimeout.x86_64 1.0.0-11.fc28 libnetfilter_queue.x86_64 1.0.2-11.fc28 socat.x86_64 1.7.3.2-6.fc28

[root@01cpn ~]# ls -la /etc/cni/net.d ls: cannot access '/etc/cni/net.d': No such file or directory

[root@01cpn ~]# ip add 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:53:29:c8 brd ff:ff:ff:ff:ff:ff inet 192.168.122.170/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0 valid_lft 2582sec preferred_lft 2582sec inet6 fe80::106:70b2:bcfb:ea01/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:1d:a8:40:1d brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever

neolit123 commented 4 years ago

KUBELET_EXTRA_ARGS=systemd

this is not valid. should be --cgroup-driver=systemd but instead try removing it.

unable to recognize "https://docs.projectcalico.org/v3.8/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused

the api server is not running.

dkypuros commented 4 years ago

[root@01cpn ~]# cat /etc/sysconfig/kubelet KUBELET_EXTRA_ARGS=

kubeadm creates the "kube-apiserver" pod with no errors. I started over again, and noticed kubeadm did it's job.

[root@01cpn ~]# kubeadm init --pod-network-cidr=192.168.0.0/16 [init] Using Kubernetes version: v1.16.3 [preflight] Running pre-flight checks [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Activating the kubelet service [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [01cpn kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.122.170] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [01cpn localhost] and IPs [192.168.122.170 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [01cpn localhost] and IPs [192.168.122.170 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 35.501947 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.16" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node 01cpn as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node 01cpn as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: xghqeg.3jhf0kl3aa2g7zza [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.122.170:6443 --token xghqeg.3jhf0kl3aa2g7zza \ --discovery-token-ca-cert-hash sha256:966fe249838aaea99c152df8ddb48844f45bdd302686b9600c42aa1d48c3f60d [root@01cpn ~]# exit logout [marz@01cpn ~]$ mkdir -p $HOME/.kube [marz@01cpn ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [marz@01cpn ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config [marz@01cpn ~]$ sudo -i [root@01cpn ~]# kubectl apply -f https://docs.projectcalico.org/v3.10/manifests/calico.yaml unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused

dkypuros commented 4 years ago

My experience with kubeadm is pretty rough so far. Maybe, I'll try a different strategy. I've practically memorized the documentation for kubeadm. Uugh.

dkypuros commented 4 years ago

Screenshot from 2019-11-25 22-35-15

neolit123 commented 4 years ago

My experience with kubeadm is pretty rough so far. Maybe, I'll try a different strategy. I've practically memorized the documentation for kubeadm. Uugh.

kubernetes is hard and kubeadm tries to fix that as much as possible without completely hiding all the internals.

[root@01cpn ~]# kubectl apply -f https://docs.projectcalico.org/v3.10/manifests/calico.yaml unable to recognize "https://docs.projectcalico.org/v3.10/manifests/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused

this can also be a kubeconfig issue if you have set the KUBECONFIG env. variable. it has precedence over the file you copied here:

[marz@01cpn ~]$ mkdir -p $HOME/.kube [marz@01cpn ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [marz@01cpn ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

try this instead: sudo kubectl apply --kubeconfig /etc/kubernetes/admin.conf -f .....

neolit123 commented 4 years ago

My experience with kubeadm is pretty rough so far. Maybe, I'll try a different strategy. I've practically memorized the documentation for kubeadm. Uugh.

starting on a fresh VM also helps.

dkypuros commented 4 years ago

Thanks for your help.

neolit123 commented 4 years ago

did you manage to solve your problems @dkypuros ?

dkypuros commented 4 years ago

No, unfortunately I wasn't able to. My goal was to do an install only using the documentation. I'm taking a break, and looking for a different approach.

dkypuros commented 4 years ago

containernetworking-plugins.x86_64 0.7.4-2.fc28

this is the package that has the plugins and i'm going to assume they are installed. but it's not the k8s distributed package and instead it's the one that comes from your distro.

something is not right with your node's (OS) setup WRT the CNI plugin installation.

try verifying what files this has https://github.com/containernetworking/plugins/releases and what files are installed on your system.

also try another CNI plugin like Calico - we have e2e tests for it and it currently works with 1.16.3.

You know what would help, is if I could find a way to replicate your e2e tests. Do you have the data posted? What OS? versions of everything etc (maybe even commands you used to standup and test). The documentation doesn't really allow me to follow your teams e2e.

neolit123 commented 4 years ago

For e2e we use a docker in docker setup, so its slugtly different. If you have problems on your nodes / OS we are not catching them and it's really hard to say what steps were problematic. I'm going to close this issue as it is not a kubeadm bug per se, but please comment on your findings.

/close

k8s-ci-robot commented 4 years ago

@neolit123: Closing this issue.

In response to [this](https://github.com/kubernetes/kubeadm/issues/1939#issuecomment-559156880): >For e2e we use a docker in docker setup, so its slugtly different. If you >have problems on your nodes / OS we are not catching them and it's really >hard to say what steps were problematic. I'm going to close this issue as >it is not a kubeadm bug per se, but please comment on your findings. > >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mateusable33 commented 2 years ago

@dkypuros try this: https://github.com/coredns/coredns/issues/3300

wssrronak commented 1 year ago

Did anyone is able resolve the issue core-dns pod stuck in pending status