Closed vrubiolo closed 3 years ago
Hi Ruben,
I am seeing some issues when attempting to deploy k8s using
kubeadm
during the preflight checks, namely:[preflight] Some fatal errors occurred: [ERROR FileExisting-crictl]: crictl not found in system path
You have to install the cri-tools from https://github.com/kubernetes-sigs/cri-tools/releases
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
modprobe br_netfilter
Can be persisted with
[root@k8s-cluster2-controller crio-lxc-build]# cat /etc/modules-load.d/kubelet.conf
br-netfilter
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
echo 1 > /proc/sys/net/ipv4/ip_forward
can be persisted with
[root@k8s-cluster2-controller crio-lxc-build]# cat /etc/sysctl.d/99-kubelet.conf
net.ipv4.ip_forward=1
[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.20.1" Control plane version: "1.19.6"
* The first one is probably a `PATH`-related error or similar * The 2nd and 3rd are likely some settings, I am looking at instructions like https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#letting-iptables-see-bridged-traffic to fix, any input welcome * it's the last one which I am not sure about, given that the `INSTALL.md` instructions are supposed to pick up the latest of k8s binaries * on this front, it looks like the `CHECKSUM` value in this block is wrong as it fails the check a few lines below:
ARCH="linux-amd64" RELEASE="1.20.1" ARCHIVE=kubernetes-server-$ARCH.tar.gz CHECKSUM="fb56486a55dbf7dbacb53b1aaa690bae18d33d244c72a1e2dc95fb0fcce45108c44ba79f8fa04f12383801c46813dc33d2d0eb2203035cdce1078871595e446e" DESTDIR="/usr/local/bin"
Full log is below, let me know if you need more information. I am running with `sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log` [kubeadm.log](https://github.com/Drachenfels-GmbH/crio-lxc/files/5866776/kubeadm.log)
I'll take a look at it.
it's the last one which I am not sure about, given that the
INSTALL.md
instructions are supposed to pick up the latest of k8s binaries
- on this front, it looks like the
CHECKSUM
value in this block is wrong as it fails the check a few lines below:ARCH="linux-amd64" RELEASE="1.20.1" ARCHIVE=kubernetes-server-$ARCH.tar.gz CHECKSUM="fb56486a55dbf7dbacb53b1aaa690bae18d33d244c72a1e2dc95fb0fcce45108c44ba79f8fa04f12383801c46813dc33d2d0eb2203035cdce1078871595e446e" DESTDIR="/usr/local/bin"
Full log is below, let me know if you need more information. I am running with
sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log
kubeadm.log
Yes the checksum is still the checksum for v1.20
. I saw that 1.20.2
is out and will update the docs in a minute.
Thanks for reporting!
Please report if this works for you now. Thanks!
Hi @r10r and thanks for fixing the instructions this fast!
I have followed the updated instructions and confirm the preflight errors are gone now, good job!
It however fails with a new error about missing /etc/containers/policy.json
:
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0127 22:23:00.430975 5584 checks.go:845] pulling k8s.gcr.io/kube-apiserver:v1.20.2
I0127 22:23:08.753710 5584 checks.go:845] pulling k8s.gcr.io/kube-controller-manager:v1.20.2
I0127 22:23:15.234637 5584 checks.go:845] pulling k8s.gcr.io/kube-scheduler:v1.20.2
I0127 22:23:23.203065 5584 checks.go:845] pulling k8s.gcr.io/kube-proxy:v1.20.2
I0127 22:23:30.849866 5584 checks.go:845] pulling k8s.gcr.io/pause:3.2
I0127 22:23:41.805311 5584 checks.go:845] pulling k8s.gcr.io/etcd:3.4.13-0
I0127 22:23:48.087088 5584 checks.go:845] pulling k8s.gcr.io/coredns:1.7.0
[preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.20.2: output: time="2021-01-27T22:23:08Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or dire
ctory"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.20.2: output: time="2021-01-27T22:23:15Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such fil
e or directory"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.20.2: output: time="2021-01-27T22:23:23Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or dire
ctory"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.20.2: output: time="2021-01-27T22:23:30Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or director
y"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.2: output: time="2021-01-27T22:23:41Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.13-0: output: time="2021-01-27T22:23:48Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.7.0: output: time="2021-01-27T22:23:55Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
I am working around the issue by setting the following in /etc/containers/policy.json
:
{
"default": [{"type": "insecureAcceptAnything"}]
}
The deployment then proceeds much further but fails later with some kubelet-related errors mentioning that the kubelet is non-healthy: kubelet.log
Below is my cluster configuration YAML file too, I am seeing many network-related errors (including missing CNI plugins) that I probably made a mistake in the configuration but I am not sure what for now ...
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.0.2.15
bindPort: 6443
nodeRegistration:
name: vagrant-k8s
criSocket: unix://var/run/crio/crio.sock
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
# kubeletExtraArgs:
# v: "5"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
kind: ClusterConfiguration
kubernetesVersion: v1.20.2
apiVersion: kubeadm.k8s.io/v1beta2
apiServer:
timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.66.0.0/16
scheduler: {}
controlPlaneEndpoint: "10.0.2.15:6443"
I am working around the issue by setting the following in `/etc/containers/policy.json`:
{ "default": [{"type": "insecureAcceptAnything"}] }
as per https://github.com/containers/image/blob/master/docs/containers-policy.json.5.md#completely-disable-security-allow-all-images-do-not-trust-any-signatures
Yes, thats fine for now.
The deployment then proceeds much further but fails later with some kubelet-related errors mentioning that the kubelet is non-healthy: kubelet.log
Below is my cluster configuration YAML file too, I am seeing many network-related errors (including missing CNI plugins) that I probably made a mistake in the configuration but I am not sure what for now ...
You have to create the loopback device at least for network configuration:
[root@k8s-cluster2-controller k8s-tools]# cat /etc/cni/net.d/200-loopback.conf
{
"cniVersion": "0.3.1",
"type": "loopback",
"name": "lo"
}
And you might want to try cilium as CNI plugin. https://docs.cilium.io/en/v1.9/gettingstarted/k8s-install-default/#install-cilium I tried calico too but switched to cilium which works without hassles for now.
Jan 27 22:30:05 archlinux kubelet[6059]: F0127 22:30:05.980166 6059 kubelet.go:1350] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 24 in cached partitions map
Did you configure the storage driver in /etc/containers/storage.conf
?
You might have hit a btrfs related issue https://github.com/kubernetes/kubernetes/issues/94335
I'm using overlay2
as storage driver. So you might want to try this configuration:
[root@k8s-cluster2-controller k8s-tools]# cat /etc/containers/storage.conf
# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md
[storage]
driver = "overlay"
[storage.options.overlay]
# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay`
# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off
# NOTE: metacopy can only be enabled when redirect_dir is enabled
# NOTE: storage driver name must be set or mountopt are not evaluated,
# even when the driver is the default driver --> BUG ?
mountopt = "nodev,redirect_dir=off,metacopy=off"
And I'm missing some information here. Please attach at least the output of journalctl -u crio -a
and mount -a
I have a log gathering script, I use it for collecting all kind of information after sonobuoy test runs. You might want to try it: https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-gather-logs-sh
You might want to try it: https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-gather-logs-sh
Please look carefully at the generated output, since it might contain sensitive information.
Thanks for the very quick feedback and help, I will look into all this
One more question though: it does not seem possible to rerun kubeadm
when failed as it will complain some certs have already been generated, Is there an easy way to rerun it w/o having to wipe the full setup (this might be written in the kubeadm docs though, I have not checked feel free to RTFM me :)
Thanks for the very quick feedback and help, I will look into all this
bonne chance
One more question though: it does not seem possible to rerun
kubeadm
when failed as it will complain some certs have already been generated, Is there an easy way to rerun it w/o having to wipe the full setup (this might be written in the kubeadm docs though, I have not checked feel free to RTFM me :)
In short kubeadm reset --help
this is your friend ;)
You can also remove /etc/kubernetes/pki
yourself after stopping kubelet.
You might have to clean up a bit more to get a clean cluster state. I've uploaded the scripts I use for purging the cluster state during development . You might want to take a look at them. https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a
https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-clear-logs-sh https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-k8s-reset-sh
Hi @r10r ,
I have taken a look at your scripts, those are great to gather all necessary information !
Please find attached the logs I have gathered so far using your scripts: 01.29_22.04.29.zip
As for your questions above, yes I did configure the storage driver as per your instructions:
[vagrant@archlinux 01.29_22.04.29]$ cat /etc/containers/storage.conf
# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md
[storage]
driver = "overlay"
[storage.options.overlay]
# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay`
# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off
# NOTE: metacopy can only be enabled when redirect_dir is enabled
# NOTE: storage driver name must be set or mountopt are not evaluated,
# even when the driver is the default driver --> BUG ?
mountopt = "nodev,redirect_dir=off,metacopy=off"
I don't have anything CNI-related though, I will create /etc/cni/net.d/200-loopback.conf
+ install Cilium and report back.
Edit:
/etc/cni/net.d/200-loopback.conf
as per your suggestion but this does not appear to change things here/sys/fs/cgroup
mounted of not:
sudo mount -t cgroup2 none /sys/fs/cgroup
has been done,kubeadm
fails early w/ missing capabilities associated w/ cgroups (looks like the lxc-checkconfig
check failing). See
kubeadm-cgroupsv2.txt/sys/fs/cgroups
is not mounted like the above, kubeadm
fails much later on, with the error I mentioned above. See
kubeadm-no-cgroupsv2.txtThanks for your continued help so far !
Hi @r10r ,
I have taken a look at your scripts, those are great to gather all necessary information !
Please find attached the logs I have gathered so far using your scripts: 01.29_22.04.29.zip
Thats good. I do see this in mounts
/dev/sda2 /var/lib/containers/storage/overlay btrfs rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
Please disable this mountpoint, reboot and try again. Again - this might be related to https://github.com/kubernetes/kubernetes/issues/94335
- if
sudo mount -t cgroup2 none /sys/fs/cgroup
has been done,kubeadm
fails early w/ missing capabilities associated w/ cgroups (looks like thelxc-checkconfig
check failing). See kubeadm-cgroupsv2.txt
Did you enable cgroups2 permanently as suggested in https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/INSTALL.md#cgroups? It should not be necessary to mount cgroups2 manually. All cgroup controllers should be enabled by default.
This is what it should look like (at least after a clean boot)
[root@k8s-cluster2-controller ~]# cat /proc/cmdline
BOOT_IMAGE=../vmlinuz-linux root=/dev/xvda1 rw systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all initrd=../initramfs-linux.img
[root@k8s-cluster2-controller ~]# stat -f /sys/fs/cgroup/
File: "/sys/fs/cgroup/"
ID: 0 Namelen: 255 Type: cgroup2fs
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 0 Free: 0 Available: 0
Inodes: Total: 0 Free: 0
[root@k8s-cluster2-controller ~]# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
[root@k8s-cluster2-controller ~]# cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu io memory hugetlb pids rdma
[WARNING FileExisting-ethtool]: ethtool not found in system path I0201 22:40:09.907155 1245 checks.go:376] validating the presence of executable socat [WARNING FileExisting-socat]: socat not found in system path
You should install socat
and ethtool
, although this is not the root cause.
Hi @r10r ,
Thanks again for your continued support !
Thats good. I do see this in
mounts
/dev/sda2 /var/lib/containers/storage/overlay btrfs rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
Please disable this mountpoint, reboot and try again. Again - this might be related to kubernetes/kubernetes#94335
I have disabled it manually by unmounting the directory as I don't see how to do that automatically. Although the overlay
storage driver is selected in /etc/containers/storage.conf
, the system keeps on adding a btrfs
entry for /dev/sda2
under /var/lib/containers/storage/overlay
. I am sure there is a better way here, would you have a hint?
After unmounting by hand, I now have:
[vagrant@archlinux ~]$ mount | grep btrfs
/dev/sda2 on / type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
Did you enable cgroups2 permanently as suggested in https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/INSTALL.md#cgroups? It should not be necessary to mount cgroups2 manually. All cgroup controllers should be enabled by default.
Your documentation mentioned to do it either permanently or dynamically, I chose the latter because I was testing.
I have now enabled cgroupsv2
permanently:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=559c8e62-62d0-4f52-a3b3-5526fffcc2d5 rw net.ifnames=0 rootflags=compress-force=zstd systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all
Incidentally, while I have the same as you for /sys/fs/cgroup/cgroup.controllers
, I don't for /sys/fs/cgroup/cgroup.subtree_control
:
[vagrant@archlinux ~]$ stat -f /sys/fs/cgroup/
File: "/sys/fs/cgroup/"
ID: 0 Namelen: 255 Type: cgroup2fs
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 0 Free: 0 Available: 0
Inodes: Total: 0 Free: 0
[vagrant@archlinux ~]$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
[vagrant@archlinux ~]$ cat /sys/fs/cgroup/cgroup.subtree_control
memory pids
You should install
socat
andethtool
, although this is not the root cause. Done.
I still fail however to start the kubelet (kubeadm reset
works well btw). The issue is the same, w/ the connection to the API server being refused:
kubeadm.log
cluster-init.yaml.txt
I set the network host address in cluster-init.yaml
because of:
[vagrant@archlinux ~]$ ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 10.0.2.15/24 fe80::a00:27ff:fe88:ee27/64
I have gathered logs as well: 02.02_22.49.28.zip
I feel I am missing something here, esp on the network side of things ... Do you have an idea as to what could do wrong here? Can you also confirm I don't need to do more beside creating /etc/cni/net.d/200-loopback.conf
on this front (as I understand the Calico/Cilium install will be done after the cluster is up)?
... Again - this might be related to kubernetes/kubernetes#94335
As suggested by the linked issue try
mount --bind /var/lib/kubelet /var/lib/kublet
systemctl restart kubelet
A possible workaround is to make sure a bind mount exists which allows kubelet's logic to find the backing fileystem. Eg. add the following fstab entry and then perform mount /var/lib/kubelet:
/var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0
... Again - this might be related to kubernetes/kubernetes#94335
As suggested by the linked issue try
mount --bind /var/lib/kubelet /var/lib/kublet systemctl restart kubelet
A possible workaround is to make sure a bind mount exists which allows kubelet's logic to find the backing fileystem. Eg. add the following fstab entry and then perform mount /var/lib/kubelet: /var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0
Thanks for pointing this out again and for your fast turnaround. I have added the mount to fstab
:
[vagrant@archlinux ~]$ cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.
# <file system> <dir> <type> <options> <dump> <pass>
#/swap/swapfile none swap defaults 0 0
#VAGRANT-BEGIN
# The contents below are automatically generated by Vagrant. Do not modify.
#VAGRANT-END
/var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0
and it is mounted:
[vagrant@archlinux ~]$ mount | grep kube
/dev/sda2 on /var/lib/kubelet type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
We are definitely on the right track as some errors do not show up anymore. However, the cluster still fails to be brought up. I see the following related error in the kubelet journalctl:
Feb 03 09:34:15 archlinux kubelet[95939]: E0203 09:34:15.458996 95939 cri_stats_provider.go:376] Failed to get the info of the filesystem with mountpoint "/var/lib/containers/storage/overlay-images": failed to get device for dir "/var/lib/containers/storage/overlay-images": could not find device with major: 0, minor: 24 in cached partitions map.
Here is my log of the run: kubeadm.log 02.03_09.40.14.zip
Are you using BTRFS yourself? You had mentioned archlinux so I wanted to align w/ your setup. Did you switch to some other filesystem for your rootfs (as you do not seem to be seeing that issue yourself)?
According to the kubelet.service.log
the error is fixed now. So it was the btrfs
root file system that tricked kubelet.
Now you have problem that the hostname can not be resolved kubelet.service.log
:
Feb 03 09:34:16 archlinux kubelet[95939]: E0203 09:34:16.896332 95939 kubelet.go:2243] node "archlinux" not found
The hostname should match the node name used for kubeadm init
So change your hostname to vagrant-k8s
using hostnamectl set-hostname vagrant-k8s
and restart kubelet.
Feb 03 09:34:15 archlinux kubelet[95939]: E0203 09:34:15.458996 95939 cri_stats_provider.go:376] Failed to get the info of the filesystem with mountpoint "/var/lib/containers/storage/overlay-images": failed to get device for dir "/var/lib/containers/storage/overlay-images": could not find device with major: 0, minor: 24 in cached partitions map.
Try to add another bind mount in /etc/fstab
and run mount /var/lib/containers/storage/overlay-images
/var/lib/containers/storage/overlay-images /var/lib/containers/storage/overlay-images none defaults,bind,nofail 0 0
Are you using BTRFS yourself? You had mentioned archlinux so I wanted to align w/ your setup. Did you switch to some other filesystem for your rootfs (as you do not seem to be seeing that issue yourself)?
No I use ext4. But please let us continue setup with the btrfs setup. I'll add a btrfs related section to the docs once we're done.
According to the
kubelet.service.log
the error is fixed now. So it was thebtrfs
root file system that tricked kubelet.
Ok, that's good to know here, thanks for confirming.
Now you have problem that the hostname can not be resolved
kubelet.service.log
:Feb 03 09:34:16 archlinux kubelet[95939]: E0203 09:34:16.896332 95939 kubelet.go:2243] node "archlinux" not found
The hostname should match the node name used for
kubeadm init
So change your hostname tovagrant-k8s
usinghostnamectl set-hostname vagrant-k8s
and restart kubelet.
Ok, I have done that too:
[vagrant@archlinux logs]$ hostnamectl status
Static hostname: vagrant-k8s
Icon name: computer-vm
Chassis: vm
Machine ID: eb97c08437a24922bda5fa9d6281e912
Boot ID: 8891403f70074d0095e31d6232c0d525
Virtualization: oracle
Operating System: Arch Linux
Kernel: Linux 5.10.7-arch1-1
Architecture: x86-64
Try to add another bind mount in
/etc/fstab
and runmount /var/lib/containers/storage/overlay-images
/var/lib/containers/storage/overlay-images /var/lib/containers/storage/overlay-images none defaults,bind,nofail 0 0
Yes! That did it, the error does not show up anymore in the kubelet log:
[vagrant@archlinux logs]$ mount | grep btrfs
/dev/sda2 on / type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
/dev/sda2 on /var/lib/kubelet type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
/dev/sda2 on /var/lib/containers/storage/overlay-images type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
No I use ext4. But please let us continue setup with the btrfs setup. I'll add a btrfs related section to the docs once we're done.
Ok, that is fair indeed given the amount of time you also spend helping me here :)
So, we are further but the cluster fails to initialize still: 02.03_13.30.35.zip kubeadm.log
What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl
but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.
References like https://github.com/kubernetes/kubernetes/issues/48798#issuecomment-321267386 make me think that we are passing a CNI option whereas we can't have one because of the chicken-and-egg issue above. It looks like something is at odd w/ the CNI config, maybe at the cri-o level?
from crio.service.log
Feb 03 13:30:22 vagrant-k8s crio[279]: time="2021-02-03 13:30:22.282784924Z" level=error msg="Container creation error: open /etc/default/crio-lxc: no such file or directory\n"
The default crio-lxc configuration is missing. Simply doing a touch /etc/default/crio-lxc
should be enough.
See https://github.com/Drachenfels-GmbH/crio-lxc#environment-file.
I'll create the file in the binary if it does not exist yet.
What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.
No I think it's not. It's ok to install the CNI plugin after kubeadm init
. All pods initialized by kubeadm init
are static pods with manifests in /etc/kubernetes/manifests/
. They get the host network namespace (the cluster IP) and need no CNI. CNI is only required for coredns
pods - but they don't block kubeadm init
from
crio.service.log
Feb 03 13:30:22 vagrant-k8s crio[279]: time="2021-02-03 13:30:22.282784924Z" level=error msg="Container creation error: open /etc/default/crio-lxc: no such file or directory\n"
The default crio-lxc configuration is missing. Simply doing a
touch /etc/default/crio-lxc
should be enough. See https://github.com/Drachenfels-GmbH/crio-lxc#environment-file.
Fantastic! This was the missing part (I had forgotten to re-read the main crio-lxc doc).
Now the cluster initializes :partying_face:
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 10.0.2.15:6443 --token 1l6ep6.gdufc3w4isp2z4f0 \
--discovery-token-ca-cert-hash sha256:fcf1fe7e5e3ce7c2fbb974727372fcd95038f6853d7d6711068b3f4218886341 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.2.15:6443 --token 1l6ep6.gdufc3w4isp2z4f0 \
--discovery-token-ca-cert-hash sha256:fcf1fe7e5e3ce7c2fbb974727372fcd95038f6853d7d6711068b3f4218886341
I'll create the file in the binary if it does not exist yet.
Good idea. A mention in the K8S docs would be very useful too. I have copied the one from your example in my case.
Cluster logs, w/ kubeadm
log included:
02.03_14.02.01.zip
What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.
No I think it's not. It's ok to install the CNI plugin after
kubeadm init
. All pods initialized bykubeadm init
are static pods with manifests in/etc/kubernetes/manifests/
. They get the host network namespace (the cluster IP) and need no CNI. CNI is only required forcoredns
pods - but they don't blockkubeadm init
Ok, that explains it. Thanks for pointing this out!
My next step will be to install the Cilium indeed as you pointed out and will report back.
Another question: for your pod container images, are you using the LXC OCI template or manually converting them via umoci?
Now the cluster initializes 🥳
Great!
My next step will be to install the Cilium indeed as you pointed out and will report back.
Well that's simple now :D
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml
Remember that you've a single node cluster and you have to untaint the control-plane in order to schedule pods. See https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/K8S.md#kubeadm-init
Another question: for your pod container images, are you using the LXC OCI template or manually converting them via umoci?
I use buildah. It's fairly simple to create new images. https://github.com/containers/buildah/blob/master/docs/tutorials/01-intro.md
Next step - Try it again on your favourite distribution :D
Thanks for all the good info!
I have started to deploy the Cilium CRDs/pods but this fails right now:
[vagrant@vagrant-k8s logs]$ kubectl -n kube-system get pods --watch
NAME READY STATUS RESTARTS AGE
cilium-operator-65c5fc987f-l656q 0/1 CrashLoopBackOff 6 16m
cilium-tgs6f 0/1 Init:CreateContainerError 0 16m
coredns-74ff55c5b-k9r7h 0/1 Pending 0 3h6m
coredns-74ff55c5b-ks297 0/1 Pending 0 3h6m
etcd-vagrant-k8s 1/1 Running 0 3h6m
kube-apiserver-vagrant-k8s 1/1 Running 0 3h6m
kube-controller-manager-vagrant-k8s 1/1 Running 0 3h6m
kube-proxy-4wfjm 0/1 CreateContainerError 0 3h6m
kube-scheduler-vagrant-k8s 1/1 Running 0 3h6m
It looks like an issue w/ crio-lxc
:
kubectl describe pods cilium-tgs6f -n kube-system
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-tgs6f to vagrant-k8s
Normal Pulling 17m kubelet Pulling image "quay.io/cilium/cilium:v1.9.3"
Normal Pulled 16m kubelet Successfully pulled image "quay.io/cilium/cilium:v1.9.3" in 27.862248371s
Normal Pulled 7m11s (x44 over 16m) kubelet Container image "quay.io/cilium/cilium:v1.9.3" already present on machine
Warning Failed 2m20s (x67 over 16m) kubelet Error: container create failed: [crio-lxc-start] failed to start container
I see the following in the crio-lxc
log:
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG conf - conf.c:dropcaps_except:2453 - Keep capability wake_alarm (35)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG conf - conf.c:dropcaps_except:2453 - Keep capability block_suspend (36)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG conf - conf.c:dropcaps_except:2453 - Keep capability audit_read (37)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:lxc_setup:3437 - Failed to keep capabilities
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR start - start.c:do_start:1267 - Failed to setup container "a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac"
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR start - start.c:__lxc_start:2082 - Failed to spawn container "a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac"
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 WARN start - start.c:lxc_abort:1012 - No such process - Failed to send SIGKILL via pidfd 16 for process 1766724
{"l":"warn","cmd":"create","cid":"a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac","pid":1766722,"status":"exit status 1","t":"20210203170734.717","c":"create.go:105","m":"start process terminated"}
Does that ring a bell to you?
As for the images, thanks for the buildah pointer. I have some existing LXC images so I was thinking of using them as-is (or via just a simple OCI conversion). You seemed to be in the same case (migrating an existing codebase to k8s). I understand you are rebuilding your containers from scratch as opposed to using existing LXC images, is that right?
As for the images, thanks for the buildah pointer. I have some existing LXC images so I was thinking of using them as-is (or via just a simple OCI conversion). You seemed to be in the same case (migrating an existing codebase to k8s). I understand you are rebuilding your containers from scratch as opposed to using existing LXC images, is that right?
You can use buildah to migrate your LXC images too. Simply create an image from scratch and copy over the rootfs.
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon
You can disable capabilities for now. Simply set CRIO_LXC_CAPABILITIES=false
in /etc/default/crio-lxc
But, we have to find out why CAP_PERFMON
is not unknown ( Maybe libcap is outdated )
From man 7 capabilities
CAP_PERFMON (since Linux 5.8) ...
Unfortunately the crio-lxc.tar is empty, so no crio-lxc logfile gather-logs.sh
, seems you did change the path to the crio-lxc logfile so gather-logs.sh
couldn't pick it up.
I need the container runtime logs. So please use the CRIO_LXC_CREATE_HOOK
to create backups from the container runtime configurations. See https://github.com/Drachenfels-GmbH/crio-lxc#create-hook for details.
Please do the following steps.
kubeadm delete -f ...
gather-logs.sh
gather-logs.sh
[root@k8s-cluster2-controller crio-lxc]# cat /etc/default/crio-lxc
CRIO_LXC_LOG_LEVEL=debug
CRIO_LXC_CONTAINER_LOG_LEVEL=debug
CRIO_LXC_CREATE_HOOK=/usr/local/bin/crio-lxc-backup.sh
[root@k8s-cluster2-controller crio-lxc]# cat /usr/local/bin/crio-lxc-backup.sh
#!/bin/sh
LOGDIR=$(dirname $LOG_FILE)
# backup container runtime directory to log directory
cp -r $RUNTIME_PATH $LOGDIR/$CONTAINER_ID
# copy OCI runtime spec to container runtime directory
cp $SPEC_PATH $LOGDIR/$CONTAINER_ID/spec.json
Hmm seems similar to https://github.com/cri-o/cri-o/issues/4478
Please attach the output of pacman -Qe
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon
You can disable capabilities for now. Simply set
CRIO_LXC_CAPABILITIES=false
in/etc/default/crio-lxc
Thanks. I will try this out.
But, we have to find out why
CAP_PERFMON
is not unknown ( Maybe libcap is outdated )From
man 7 capabilities
CAP_PERFMON (since Linux 5.8) ...
Unfortunately the crio-lxc.tar is empty, so no crio-lxc logfile
gather-logs.sh
, seems you did change the path to the crio-lxc logfile sogather-logs.sh
couldn't pick it up.
I had not changed the path, it's just that I am not running as root and I forgot to update the tar
command so that it could access this directory. I have fixed that.
I need the container runtime logs. So please use the
CRIO_LXC_CREATE_HOOK
to create backups from the container runtime configurations. See https://github.com/Drachenfels-GmbH/crio-lxc#create-hook for details.Please do the following steps.
* Apply the crio-lxc configuration below * Delete the cilium pods `kubeadm delete -f ...` * Create the cilium pods again. * Run `gather-logs.sh` * Attach the output form `gather-logs.sh`
All done (let me know if there is an easy way to reduce the log archive size, I had run clear-logs.sh
just before recreating the cilium pods) :
02.03_20.43.34.zip
W.r.t pods, Cilium creates DaemonSets
and ReplicaSets
so I removed those as well, not just the pods.
As for pacman
, here is the info:
pacman.txt
hmm, spec.json
files are missing in the output. Did you make /usr/local/bin/crio-lxc-backup.sh
executable ?
no you didn't :D
{"l":"error","cmd":"create","cid":"f7acd03ba8b6368667d1dbd30f76859aa7aa9ddbf28992cf6170bee63a7a7a73","error":"fork/exec /usr/local/bin/crio-lxc-backup.sh: permission denied","file":"/usr/local/bin/crio-lxc-backup.sh","t":"20210203201119.112","c":"cli.go:308","m":"failed to execute create hook"}
Ok, let me do that again :)
Edit: here it is:
02.03_21.29.32.zip
(took me time to check spec.json
were there, hopefully you have everything)
Thanks.
I'll upgrade liblxc tomorrow to support new all newer capabilities e.g CAP_PERFMON https://github.com/lxc/lxc/commit/7b4cd4681da399acc1775773d7967a3c94635346
You then have to rebuild liblxc. After that you can comment CRIO_LXC_CAPABILITIES=false
again.
Did it work for you with CRIO_LXC_CAPABILITIES=false
set ?
Thanks.
I'll upgrade liblxc tomorrow to support new all newer capabilities e.g CAP_PERFMON lxc/lxc@7b4cd46
Excellent !
You then have to rebuild liblxc. After that you can comment
CRIO_LXC_CAPABILITIES=false
again.
Will do.
Did it work for you with
CRIO_LXC_CAPABILITIES=false
set ?
It looks like so, as the cilium pod is now running:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-7d4zc 1/1 Running 0 5m42s
kube-system cilium-operator-696dc48d8d-tknls 0/1 ImagePullBackOff 0 5m42s
kube-system coredns-74ff55c5b-k9r7h 0/1 CrashLoopBackOff 25 7h46m
kube-system coredns-74ff55c5b-ks297 0/1 CrashLoopBackOff 25 7h46m
kube-system etcd-vagrant-k8s 1/1 Running 0 7h46m
kube-system kube-apiserver-vagrant-k8s 1/1 Running 0 7h46m
kube-system kube-controller-manager-vagrant-k8s 1/1 Running 0 7h46m
kube-system kube-proxy-4wfjm 1/1 Running 0 7h46m
kube-system kube-scheduler-vagrant-k8s 1/1 Running 0 7h46m
Not sure about its operator though and why it is in ImagePullBackOff
nor why coredns is misbehaving.
Attached are the logs, again:
02.03_21.46.30.zip
Running a simple image appear to work (I like to use https://github.com/kubernetes-up-and-running/kuard):
[vagrant@vagrant-k8s ~]$ kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
[vagrant@vagrant-k8s ~]$ k get pods
NAME READY STATUS RESTARTS AGE
kuard 1/1 Running 0 7m2s
[vagrant@vagrant-k8s ~]$ k describe pod kuard
Name: kuard
Namespace: default
Priority: 0
Node: vagrant-k8s/10.0.2.15
Start Time: Wed, 03 Feb 2021 21:53:05 +0000
Labels: run=kuard
Annotations: <none>
Status: Running
IP: 10.0.0.135
IPs:
IP: 10.0.0.135
Containers:
kuard:
Container ID: cri-o://059c7537c1fe16cc6dbae245652d0fe7ea33a22170596c2e7660c98d30e970b4
Image: gcr.io/kuar-demo/kuard-amd64:blue
Image ID: gcr.io/kuar-demo/kuard-amd64@sha256:1ecc9fb2c871302fdb57a25e0c076311b7b352b0a9246d442940ca8fb4efe229
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 03 Feb 2021 21:53:09 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bz5lh (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-bz5lh:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bz5lh
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m38s (x4 over 5m56s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Normal Scheduled 3m25s default-scheduler Successfully assigned default/kuard to vagrant-k8s
Normal Pulling 3m25s kubelet Pulling image "gcr.io/kuar-demo/kuard-amd64:blue"
Normal Pulled 3m21s kubelet Successfully pulled image "gcr.io/kuar-demo/kuard-amd64:blue" in 3.799843248s
Normal Created 3m21s kubelet Created container kuard
Normal Started 3m21s kubelet Started container kuard
But it looks like networking might be missing something as I cannot get something on my host when kube-proxy
runs, cf the test on the kuard
page:
kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
kubectl port-forward kuard 8080:8080
I cannot reach anything after that on https://localhost:8080
Is cillium healthy ?
I noticed an issue with cillium 1.9.4 please try 1.9.3 (deployment file is attached)
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m51s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
Warning FailedScheduling 5m51s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
Normal Scheduled 5m48s default-scheduler Successfully assigned kube-system/cilium-operator-696dc48d8d-k4mxq to k8s-cluster2-controller
Normal Pulling 3m19s (x4 over 5m48s) kubelet Pulling image "quay.io/cilium/operator-generic:v1.9.4"
Warning Failed 3m17s (x4 over 5m31s) kubelet Failed to pull image "quay.io/cilium/operator-generic:v1.9.4": rpc error: code = Unknown desc = Error reading manifest v1.9.4 in quay.io/cilium/operator-generic: manifest unknown: manifest unknown
Warning Failed 3m17s (x4 over 5m31s) kubelet Error: ErrImagePull
Normal BackOff 3m5s (x6 over 5m30s) kubelet Back-off pulling image "quay.io/cilium/operator-generic:v1.9.4"
Warning Failed 42s (x16 over 5m30s) kubelet Error: ImagePullBackOff
I noticed an issue with cillium 1.9.4 please try 1.9.3 (deployment file is attached)
Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 5m51s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules. Warning FailedScheduling 5m51s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules. Normal Scheduled 5m48s default-scheduler Successfully assigned kube-system/cilium-operator-696dc48d8d-k4mxq to k8s-cluster2-controller Normal Pulling 3m19s (x4 over 5m48s) kubelet Pulling image "quay.io/cilium/operator-generic:v1.9.4" Warning Failed 3m17s (x4 over 5m31s) kubelet Failed to pull image "quay.io/cilium/operator-generic:v1.9.4": rpc error: code = Unknown desc = Error reading manifest v1.9.4 in quay.io/cilium/operator-generic: manifest unknown: manifest unknown Warning Failed 3m17s (x4 over 5m31s) kubelet Error: ErrImagePull Normal BackOff 3m5s (x6 over 5m30s) kubelet Back-off pulling image "quay.io/cilium/operator-generic:v1.9.4" Warning Failed 42s (x16 over 5m30s) kubelet Error: ImagePullBackOff
Yup, much better:
[vagrant@vagrant-k8s ~]$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-66pbl 1/1 Running 0 2m57s
kube-system cilium-operator-65c5fc987f-79hxb 1/1 Running 0 2m57s
kube-system coredns-74ff55c5b-k9r7h 0/1 Running 32 8h
kube-system coredns-74ff55c5b-ks297 0/1 Running 32 8h
kube-system etcd-vagrant-k8s 1/1 Running 0 8h
kube-system kube-apiserver-vagrant-k8s 1/1 Running 0 8h
kube-system kube-controller-manager-vagrant-k8s 1/1 Running 0 8h
kube-system kube-proxy-4wfjm 1/1 Running 0 8h
kube-system kube-scheduler-vagrant-k8s 1/1 Running 0 8h
although coredns
does not appear to be able to be ready nor has the situation improved w.r.t my kuard test app ...
02.03_22.19.34.zip
Delete the coredns pods - they will be added back automatically. I suspect they have a wrong IP since cilium was not healthy.
Ok, thanks for the suggestion, this seemed to help.
[vagrant@vagrant-k8s logs]$ k get pods
NAME READY STATUS RESTARTS AGE
cilium-66pbl 1/1 Running 0 11h
cilium-operator-65c5fc987f-79hxb 1/1 Running 0 11h
coredns-74ff55c5b-gh79t 0/1 Running 50 3h26m
coredns-74ff55c5b-qqplv 1/1 Running 0 3h26m
etcd-vagrant-k8s 1/1 Running 0 19h
kube-apiserver-vagrant-k8s 1/1 Running 0 19h
kube-controller-manager-vagrant-k8s 1/1 Running 0 19h
kube-proxy-4wfjm 1/1 Running 0 19h
kube-scheduler-vagrant-k8s 1/1 Running 0 19h
One of the coredns
pods never seem to get ready though:
[vagrant@vagrant-k8s logs]$ k describe pods coredns-74ff55c5b-svd7z
Name: coredns-74ff55c5b-svd7z
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: vagrant-k8s/10.0.2.15
Start Time: Thu, 04 Feb 2021 09:32:00 +0000
Labels: k8s-app=kube-dns
pod-template-hash=74ff55c5b
Annotations: <none>
Status: Running
IP: 10.0.0.213
IPs:
IP: 10.0.0.213
Controlled By: ReplicaSet/coredns-74ff55c5b
Containers:
coredns:
Container ID: cri-o://f05759261ce5e7359433cac67bdbaaffd5a37a1112f2c9ba3fed43ab7f6ff183
Image: k8s.gcr.io/coredns:1.7.0
Image ID: k8s.gcr.io/coredns@sha256:242d440e3192ffbcecd40e9536891f4d9be46a650363f3a004497c2070f96f5a
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Thu, 04 Feb 2021 09:32:01 +0000
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-fcqq4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-fcqq4:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-fcqq4
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58s default-scheduler Successfully assigned kube-system/coredns-74ff55c5b-svd7z to vagrant-k8s
Normal Pulled 57s kubelet Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
Normal Created 57s kubelet Created container coredns
Normal Started 57s kubelet Started container coredns
Warning Unhealthy 13s (x3 over 43s) kubelet Readiness probe failed: Get "http://10.0.0.213:8181/ready": dial tcp 10.0.0.213:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 3s (x3 over 53s) kubelet Readiness probe failed: Get "http://10.0.0.213:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Also the situation is actually better than I reported w.r.t kuard
(I was missing one level of indirection and did not forward the ports from my VM). I can reach port 8080
on localhost
when kube-proxy
is forwarding my port:
[vagrant@vagrant-k8s logs]$ k logs kuard
2021/02/04 09:34:28 Starting kuard version: v0.10.0-blue
2021/02/04 09:34:28 **********************************************************************
2021/02/04 09:34:28 * WARNING: This server may expose sensitive
2021/02/04 09:34:28 * and secret information. Be careful.
2021/02/04 09:34:28 **********************************************************************
2021/02/04 09:34:28 Config:
{
"address": ":8080",
"debug": false,
"debug-sitedata-dir": "./sitedata",
"keygen": {
"enable": false,
"exit-code": 0,
"exit-on-complete": false,
"memq-queue": "",
"memq-server": "",
"num-to-gen": 0,
"time-to-run": 0
},
"liveness": {
"fail-next": 0
},
"readiness": {
"fail-next": 0
},
"tls-address": ":8443",
"tls-dir": "/tls"
}
2021/02/04 09:34:28 Could not find certificates to serve TLS
2021/02/04 09:34:28 Serving on HTTP on :8080
[vagrant@vagrant-k8s logs]$ kubectl port-forward kuard 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
E0204 09:35:17.295472 1893851 portforward.go:385] error copying from local connection to remote stream: read tcp6 [::1]:8080->[::1]:41432: read: connection reset by peer
Handling connection for 8080
[vagrant@vagrant-k8s ~]$ curl http://localhost:8080
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>KUAR Demo</title>
<link rel="stylesheet" href="/static/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/styles.css">
<script>
var pageContext = {"urlBase":"","hostname":"kuard","addrs":["10.0.0.32"],"version":"v0.10.0-blue","versionColor":"hsl(339,100%,50%)","requestDump":"GET / HTTP/1.1\r\nHost: localhost:8080\r\nAccept: */*\r\nUser-Agent: curl/7.74.0","requestProto":"HTTP/1.1","requestAddr":"127.0.0.1:49912"}
</script>
</head>
<svg style="position: absolute; width: 0; height: 0; overflow: hidden;" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<symbol id="icon-power" viewBox="0 0 32 32">
<title>power</title>
<path class="path1" d="M12 0l-12 16h12l-8 16 28-20h-16l12-12z"></path>
</symbol>
<symbol id="icon-notification" viewBox="0 0 32 32">
<title>notification</title>
<path class="path1" d="M16 3c-3.472 0-6.737 1.352-9.192 3.808s-3.808 5.72-3.808 9.192c0 3.472 1.352 6.737 3.808 9.192s5.72 3.808 9.192 3.808c3.472 0 6.737-1.352 9.192-3.808s3.808-5.72 3.808-9.192c0-3.472-1.352-6.737-3.808-9.192s-5.72-3.808-9.192-3.808zM16 0v0c8.837 0 16 7.163 16 16s-7.163 16-16 16c-8.837 0-16-7.163-16-16s7.163-16 16-16zM14 22h4v4h-4zM14 6h4v12h-4z"></path>
</symbol>
</defs>
</svg>
<body>
<div id="root"></div>
<script src="/built/bundle.js" type="text/javascript"></script>
</body>
</html>
Here are some questions:
coredns
issues?
Hi Ruben,
I am seeing some issues when attempting to deploy k8s using
kubeadm
during the preflight checks, namely:PATH
-related error or similarINSTALL.md
instructions are supposed to pick up the latest of k8s binariesCHECKSUM
value in this block is wrong as it fails the check a few lines below:Full log is below, let me know if you need more information. I am running with
sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log
kubeadm.log