Issues seen during k8s deployment with kubeadm

lxc / lxcri

CRI-O support for lxc

Apache License 2.0

67 stars 19 forks source link

Issues seen during k8s deployment with kubeadm #44

Closed vrubiolo closed 3 years ago

vrubiolo commented 3 years ago

Hi Ruben,

I am seeing some issues when attempting to deploy k8s using kubeadm during the preflight checks, namely:

[preflight] Some fatal errors occurred:
        [ERROR FileExisting-crictl]: crictl not found in system path
        [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
        [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
        [ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.20.1" Control plane version: "1.19.6"

The first one is probably a PATH-related error or similar
The 2nd and 3rd are likely some settings, I am looking at instructions like https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#letting-iptables-see-bridged-traffic to fix, any input welcome
it's the last one which I am not sure about, given that the INSTALL.md instructions are supposed to pick up the latest of k8s binaries
- on this front, it looks like the CHECKSUM value in this block is wrong as it fails the check a few lines below:
```
ARCH="linux-amd64"
RELEASE="1.20.1"
ARCHIVE=kubernetes-server-$ARCH.tar.gz
CHECKSUM="fb56486a55dbf7dbacb53b1aaa690bae18d33d244c72a1e2dc95fb0fcce45108c44ba79f8fa04f12383801c46813dc33d2d0eb2203035cdce1078871595e446e"
DESTDIR="/usr/local/bin"
```

Full log is below, let me know if you need more information. I am running with sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log kubeadm.log

r10r commented 3 years ago

Hi Ruben,

I am seeing some issues when attempting to deploy k8s using kubeadm during the preflight checks, namely:
[preflight] Some fatal errors occurred:
        [ERROR FileExisting-crictl]: crictl not found in system path

You have to install the cri-tools from https://github.com/kubernetes-sigs/cri-tools/releases

    [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist

modprobe br_netfilter

Can be persisted with

[root@k8s-cluster2-controller crio-lxc-build]# cat /etc/modules-load.d/kubelet.conf
br-netfilter

    [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1

echo 1 >  /proc/sys/net/ipv4/ip_forward

can be persisted with

[root@k8s-cluster2-controller crio-lxc-build]# cat /etc/sysctl.d/99-kubelet.conf
net.ipv4.ip_forward=1

    [ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.20.1" Control plane version: "1.19.6"


* The first one is probably a `PATH`-related error or similar
* The 2nd and 3rd are likely some settings, I am looking at instructions like https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#letting-iptables-see-bridged-traffic to fix, any input welcome
* it's the last one which I am not sure about, given that the `INSTALL.md` instructions are supposed to pick up the latest of k8s binaries

  * on this front, it looks like the `CHECKSUM` value in this block is wrong as it fails the check a few lines below:

ARCH="linux-amd64" RELEASE="1.20.1" ARCHIVE=kubernetes-server-$ARCH.tar.gz CHECKSUM="fb56486a55dbf7dbacb53b1aaa690bae18d33d244c72a1e2dc95fb0fcce45108c44ba79f8fa04f12383801c46813dc33d2d0eb2203035cdce1078871595e446e" DESTDIR="/usr/local/bin"



Full log is below, let me know if you need more information. I am running with `sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log`
[kubeadm.log](https://github.com/Drachenfels-GmbH/crio-lxc/files/5866776/kubeadm.log)

I'll take a look at it.

r10r commented 3 years ago

it's the last one which I am not sure about, given that the INSTALL.md instructions are supposed to pick up the latest of k8s binaries

on this front, it looks like the CHECKSUM value in this block is wrong as it fails the check a few lines below:
ARCH="linux-amd64"
RELEASE="1.20.1"
ARCHIVE=kubernetes-server-$ARCH.tar.gz
CHECKSUM="fb56486a55dbf7dbacb53b1aaa690bae18d33d244c72a1e2dc95fb0fcce45108c44ba79f8fa04f12383801c46813dc33d2d0eb2203035cdce1078871595e446e"
DESTDIR="/usr/local/bin"
Full log is below, let me know if you need more information. I am running with sudo kubeadm init --config cluster-init.yaml -v 5 2>&1 | tee kubeadm.log kubeadm.log

Yes the checksum is still the checksum for v1.20. I saw that 1.20.2 is out and will update the docs in a minute.
Thanks for reporting!

r10r commented 3 years ago

Please report if this works for you now. Thanks!

vrubiolo commented 3 years ago

Hi @r10r and thanks for fixing the instructions this fast!

I have followed the updated instructions and confirm the preflight errors are gone now, good job!

It however fails with a new error about missing /etc/containers/policy.json:

[preflight] Pulling images required for setting up a Kubernetes cluster                                                                                                                                                                       
[preflight] This might take a minute or two, depending on the speed of your internet connection                        
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'                  
I0127 22:23:00.430975    5584 checks.go:845] pulling k8s.gcr.io/kube-apiserver:v1.20.2
I0127 22:23:08.753710    5584 checks.go:845] pulling k8s.gcr.io/kube-controller-manager:v1.20.2
I0127 22:23:15.234637    5584 checks.go:845] pulling k8s.gcr.io/kube-scheduler:v1.20.2                                                                                                                                                        
I0127 22:23:23.203065    5584 checks.go:845] pulling k8s.gcr.io/kube-proxy:v1.20.2
I0127 22:23:30.849866    5584 checks.go:845] pulling k8s.gcr.io/pause:3.2                                                                                                                                                                     
I0127 22:23:41.805311    5584 checks.go:845] pulling k8s.gcr.io/etcd:3.4.13-0     
I0127 22:23:48.087088    5584 checks.go:845] pulling k8s.gcr.io/coredns:1.7.0                                                                                                                                                                 
[preflight] Some fatal errors occurred:                                                                                                                                                                                                       
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.20.2: output: time="2021-01-27T22:23:08Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or dire
ctory"                                                                                                                 
, error: exit status 1                                                                                                                                                                                                                        
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.20.2: output: time="2021-01-27T22:23:15Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such fil
e or directory"                                                                                                                                                                                                                               
, error: exit status 1                                                                                                 
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.20.2: output: time="2021-01-27T22:23:23Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or dire
ctory"                                                                                                                                                                                                                                        
, error: exit status 1                                                                                                 
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.20.2: output: time="2021-01-27T22:23:30Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or director
y"                                                                                                                     
, error: exit status 1                                                                                                 
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.2: output: time="2021-01-27T22:23:41Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1                                                                                                 
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.4.13-0: output: time="2021-01-27T22:23:48Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1                                                                                                 
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.7.0: output: time="2021-01-27T22:23:55Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = open /etc/containers/policy.json: no such file or directory"
, error: exit status 1                                                                                                 
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`        
error execution phase preflight

I am working around the issue by setting the following in /etc/containers/policy.json:

{
    "default": [{"type": "insecureAcceptAnything"}]
}

as per https://github.com/containers/image/blob/master/docs/containers-policy.json.5.md#completely-disable-security-allow-all-images-do-not-trust-any-signatures

The deployment then proceeds much further but fails later with some kubelet-related errors mentioning that the kubelet is non-healthy: kubelet.log

Below is my cluster configuration YAML file too, I am seeing many network-related errors (including missing CNI plugins) that I probably made a mistake in the configuration but I am not sure what for now ...

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.0.2.15
  bindPort: 6443
nodeRegistration:
  name: vagrant-k8s
  criSocket: unix://var/run/crio/crio.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
#  kubeletExtraArgs:
#   v: "5"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
kind: ClusterConfiguration
kubernetesVersion: v1.20.2
apiVersion: kubeadm.k8s.io/v1beta2
apiServer:
  timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.66.0.0/16
scheduler: {}
controlPlaneEndpoint: "10.0.2.15:6443"

r10r commented 3 years ago


I am working around the issue by setting the following in `/etc/containers/policy.json`:

{ "default": [{"type": "insecureAcceptAnything"}] }



as per https://github.com/containers/image/blob/master/docs/containers-policy.json.5.md#completely-disable-security-allow-all-images-do-not-trust-any-signatures

Yes, thats fine for now.

The deployment then proceeds much further but fails later with some kubelet-related errors mentioning that the kubelet is non-healthy: kubelet.log

Below is my cluster configuration YAML file too, I am seeing many network-related errors (including missing CNI plugins) that I probably made a mistake in the configuration but I am not sure what for now ...

You have to create the loopback device at least for network configuration:

[root@k8s-cluster2-controller k8s-tools]# cat /etc/cni/net.d/200-loopback.conf 
{
    "cniVersion": "0.3.1",
    "type": "loopback",
    "name": "lo"
}

And you might want to try cilium as CNI plugin. https://docs.cilium.io/en/v1.9/gettingstarted/k8s-install-default/#install-cilium I tried calico too but switched to cilium which works without hassles for now.

Jan 27 22:30:05 archlinux kubelet[6059]: F0127 22:30:05.980166 6059 kubelet.go:1350] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 24 in cached partitions map

Did you configure the storage driver in /etc/containers/storage.conf ?

You might have hit a btrfs related issue https://github.com/kubernetes/kubernetes/issues/94335

I'm using overlay2 as storage driver. So you might want to try this configuration:

 [root@k8s-cluster2-controller k8s-tools]# cat /etc/containers/storage.conf
# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md
[storage]
driver = "overlay"

[storage.options.overlay] 
# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay`
# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off
# NOTE: metacopy can only be enabled when redirect_dir is enabled
# NOTE: storage driver name must be set or mountopt are not evaluated,
# even when the driver is the default driver --> BUG ?
mountopt = "nodev,redirect_dir=off,metacopy=off"

And I'm missing some information here. Please attach at least the output of journalctl -u crio -a and mount -a

I have a log gathering script, I use it for collecting all kind of information after sonobuoy test runs. You might want to try it: https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-gather-logs-sh

r10r commented 3 years ago

You might want to try it: https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-gather-logs-sh

Please look carefully at the generated output, since it might contain sensitive information.

vrubiolo commented 3 years ago

Thanks for the very quick feedback and help, I will look into all this

One more question though: it does not seem possible to rerun kubeadm when failed as it will complain some certs have already been generated, Is there an easy way to rerun it w/o having to wipe the full setup (this might be written in the kubeadm docs though, I have not checked feel free to RTFM me :)

r10r commented 3 years ago

Thanks for the very quick feedback and help, I will look into all this

bonne chance

One more question though: it does not seem possible to rerun kubeadm when failed as it will complain some certs have already been generated, Is there an easy way to rerun it w/o having to wipe the full setup (this might be written in the kubeadm docs though, I have not checked feel free to RTFM me :)

In short kubeadm reset --help this is your friend ;) You can also remove /etc/kubernetes/pki yourself after stopping kubelet.

You might have to clean up a bit more to get a clean cluster state. I've uploaded the scripts I use for purging the cluster state during development . You might want to take a look at them. https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a

https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-clear-logs-sh https://gist.github.com/r10r/72ce519944796d62eef837d0e3e6f23a#file-k8s-reset-sh

vrubiolo commented 3 years ago

Hi @r10r ,

I have taken a look at your scripts, those are great to gather all necessary information !

Please find attached the logs I have gathered so far using your scripts: 01.29_22.04.29.zip

As for your questions above, yes I did configure the storage driver as per your instructions:

[vagrant@archlinux 01.29_22.04.29]$ cat /etc/containers/storage.conf 
# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md
[storage]
driver = "overlay"

[storage.options.overlay] 
# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay`
# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off
# NOTE: metacopy can only be enabled when redirect_dir is enabled
# NOTE: storage driver name must be set or mountopt are not evaluated,
# even when the driver is the default driver --> BUG ?
mountopt = "nodev,redirect_dir=off,metacopy=off"

I don't have anything CNI-related though, I will create /etc/cni/net.d/200-loopback.conf + install Cilium and report back.

Edit:

OK I have created /etc/cni/net.d/200-loopback.conf as per your suggestion but this does not appear to change things here
Cilium install needs k8s so I will do it after the cluster is up
I see different things depending on whether I have /sys/fs/cgroup mounted of not:
- if sudo mount -t cgroup2 none /sys/fs/cgroup has been done,kubeadm fails early w/ missing capabilities associated w/ cgroups (looks like the lxc-checkconfig check failing). See kubeadm-cgroupsv2.txt
- if /sys/fs/cgroups is not mounted like the above, kubeadm fails much later on, with the error I mentioned above. See kubeadm-no-cgroupsv2.txt

Thanks for your continued help so far !

r10r commented 3 years ago

Hi @r10r ,

I have taken a look at your scripts, those are great to gather all necessary information !

Please find attached the logs I have gathered so far using your scripts: 01.29_22.04.29.zip

Thats good. I do see this in mounts

/dev/sda2 /var/lib/containers/storage/overlay btrfs rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0

Please disable this mountpoint, reboot and try again. Again - this might be related to https://github.com/kubernetes/kubernetes/issues/94335

if sudo mount -t cgroup2 none /sys/fs/cgroup has been done,kubeadm fails early w/ missing capabilities associated w/ cgroups (looks like the lxc-checkconfig check failing). See kubeadm-cgroupsv2.txt

Did you enable cgroups2 permanently as suggested in https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/INSTALL.md#cgroups? It should not be necessary to mount cgroups2 manually. All cgroup controllers should be enabled by default.

This is what it should look like (at least after a clean boot)

[root@k8s-cluster2-controller ~]# cat /proc/cmdline 
BOOT_IMAGE=../vmlinuz-linux root=/dev/xvda1 rw systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all initrd=../initramfs-linux.img
[root@k8s-cluster2-controller ~]# stat -f /sys/fs/cgroup/
  File: "/sys/fs/cgroup/"
    ID: 0        Namelen: 255     Type: cgroup2fs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0
[root@k8s-cluster2-controller ~]# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids rdma
[root@k8s-cluster2-controller ~]# cat /sys/fs/cgroup/cgroup.subtree_control 
cpuset cpu io memory hugetlb pids rdma

[WARNING FileExisting-ethtool]: ethtool not found in system path I0201 22:40:09.907155 1245 checks.go:376] validating the presence of executable socat [WARNING FileExisting-socat]: socat not found in system path

You should install socat and ethtool, although this is not the root cause.

vrubiolo commented 3 years ago

Hi @r10r ,

Thanks again for your continued support !

Thats good. I do see this in mounts
/dev/sda2 /var/lib/containers/storage/overlay btrfs rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
Please disable this mountpoint, reboot and try again. Again - this might be related to kubernetes/kubernetes#94335

I have disabled it manually by unmounting the directory as I don't see how to do that automatically. Although the overlay storage driver is selected in /etc/containers/storage.conf, the system keeps on adding a btrfs entry for /dev/sda2 under /var/lib/containers/storage/overlay. I am sure there is a better way here, would you have a hint?

After unmounting by hand, I now have:

[vagrant@archlinux ~]$ mount | grep btrfs
/dev/sda2 on / type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)

Did you enable cgroups2 permanently as suggested in https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/INSTALL.md#cgroups? It should not be necessary to mount cgroups2 manually. All cgroup controllers should be enabled by default.

Your documentation mentioned to do it either permanently or dynamically, I chose the latter because I was testing.

I have now enabled cgroupsv2 permanently:

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=559c8e62-62d0-4f52-a3b3-5526fffcc2d5 rw net.ifnames=0 rootflags=compress-force=zstd systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all

Incidentally, while I have the same as you for /sys/fs/cgroup/cgroup.controllers, I don't for /sys/fs/cgroup/cgroup.subtree_control:

[vagrant@archlinux ~]$ stat -f /sys/fs/cgroup/
  File: "/sys/fs/cgroup/"
    ID: 0        Namelen: 255     Type: cgroup2fs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0
[vagrant@archlinux ~]$ cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids rdma
[vagrant@archlinux ~]$ cat /sys/fs/cgroup/cgroup.subtree_control 
memory pids

You should install socat and ethtool, although this is not the root cause. Done.

I still fail however to start the kubelet (kubeadm reset works well btw). The issue is the same, w/ the connection to the API server being refused: kubeadm.log cluster-init.yaml.txt

I set the network host address in cluster-init.yaml because of:

[vagrant@archlinux ~]$ ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
eth0             UP             10.0.2.15/24 fe80::a00:27ff:fe88:ee27/64

I have gathered logs as well: 02.02_22.49.28.zip

I feel I am missing something here, esp on the network side of things ... Do you have an idea as to what could do wrong here? Can you also confirm I don't need to do more beside creating /etc/cni/net.d/200-loopback.conf on this front (as I understand the Calico/Cilium install will be done after the cluster is up)?

r10r commented 3 years ago

... Again - this might be related to kubernetes/kubernetes#94335

As suggested by the linked issue try

mount --bind /var/lib/kubelet /var/lib/kublet
systemctl restart kubelet

A possible workaround is to make sure a bind mount exists which allows kubelet's logic to find the backing fileystem. Eg. add the following fstab entry and then perform mount /var/lib/kubelet:

/var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0

vrubiolo commented 3 years ago

... Again - this might be related to kubernetes/kubernetes#94335

As suggested by the linked issue try
mount --bind /var/lib/kubelet /var/lib/kublet
systemctl restart kubelet
A possible workaround is to make sure a bind mount exists which allows kubelet's logic to find the backing fileystem. Eg. add the following fstab entry and then perform mount /var/lib/kubelet: /var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0

Thanks for pointing this out again and for your fast turnaround. I have added the mount to fstab:

[vagrant@archlinux ~]$ cat /etc/fstab 
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
#/swap/swapfile none swap defaults 0 0
#VAGRANT-BEGIN
# The contents below are automatically generated by Vagrant. Do not modify.
#VAGRANT-END

/var/lib/kubelet /var/lib/kubelet none defaults,bind,nofail 0 0

and it is mounted:

[vagrant@archlinux ~]$ mount | grep kube
/dev/sda2 on /var/lib/kubelet type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)

We are definitely on the right track as some errors do not show up anymore. However, the cluster still fails to be brought up. I see the following related error in the kubelet journalctl:

Feb 03 09:34:15 archlinux kubelet[95939]: E0203 09:34:15.458996   95939 cri_stats_provider.go:376] Failed to get the info of the filesystem with mountpoint "/var/lib/containers/storage/overlay-images": failed to get device for dir "/var/lib/containers/storage/overlay-images": could not find device with major: 0, minor: 24 in cached partitions map.

Here is my log of the run: kubeadm.log 02.03_09.40.14.zip

Are you using BTRFS yourself? You had mentioned archlinux so I wanted to align w/ your setup. Did you switch to some other filesystem for your rootfs (as you do not seem to be seeing that issue yourself)?

r10r commented 3 years ago

According to the kubelet.service.log the error is fixed now. So it was the btrfs root file system that tricked kubelet.

Now you have problem that the hostname can not be resolved kubelet.service.log:

Feb 03 09:34:16 archlinux kubelet[95939]: E0203 09:34:16.896332   95939 kubelet.go:2243] node "archlinux" not found

The hostname should match the node name used for kubeadm init So change your hostname to vagrant-k8s using hostnamectl set-hostname vagrant-k8s and restart kubelet.

Feb 03 09:34:15 archlinux kubelet[95939]: E0203 09:34:15.458996   95939 cri_stats_provider.go:376] Failed to get the info of the filesystem with mountpoint "/var/lib/containers/storage/overlay-images": failed to get device for dir "/var/lib/containers/storage/overlay-images": could not find device with major: 0, minor: 24 in cached partitions map.

Try to add another bind mount in /etc/fstab and run mount /var/lib/containers/storage/overlay-images

/var/lib/containers/storage/overlay-images /var/lib/containers/storage/overlay-images  none defaults,bind,nofail 0 0

Are you using BTRFS yourself? You had mentioned archlinux so I wanted to align w/ your setup. Did you switch to some other filesystem for your rootfs (as you do not seem to be seeing that issue yourself)?

No I use ext4. But please let us continue setup with the btrfs setup. I'll add a btrfs related section to the docs once we're done.

vrubiolo commented 3 years ago

According to the kubelet.service.log the error is fixed now. So it was the btrfs root file system that tricked kubelet.

Ok, that's good to know here, thanks for confirming.

Now you have problem that the hostname can not be resolved kubelet.service.log:
Feb 03 09:34:16 archlinux kubelet[95939]: E0203 09:34:16.896332   95939 kubelet.go:2243] node "archlinux" not found
The hostname should match the node name used for kubeadm init So change your hostname to vagrant-k8s using hostnamectl set-hostname vagrant-k8s and restart kubelet.

Ok, I have done that too:

[vagrant@archlinux logs]$ hostnamectl status
   Static hostname: vagrant-k8s
         Icon name: computer-vm
           Chassis: vm
        Machine ID: eb97c08437a24922bda5fa9d6281e912
           Boot ID: 8891403f70074d0095e31d6232c0d525
    Virtualization: oracle
  Operating System: Arch Linux
            Kernel: Linux 5.10.7-arch1-1
      Architecture: x86-64

Try to add another bind mount in /etc/fstab and run mount /var/lib/containers/storage/overlay-images
/var/lib/containers/storage/overlay-images /var/lib/containers/storage/overlay-images  none defaults,bind,nofail 0 0

Yes! That did it, the error does not show up anymore in the kubelet log:

[vagrant@archlinux logs]$ mount | grep btrfs
/dev/sda2 on / type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
/dev/sda2 on /var/lib/kubelet type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)
/dev/sda2 on /var/lib/containers/storage/overlay-images type btrfs (rw,relatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/)

No I use ext4. But please let us continue setup with the btrfs setup. I'll add a btrfs related section to the docs once we're done.

Ok, that is fair indeed given the amount of time you also spend helping me here :)

So, we are further but the cluster fails to initialize still: 02.03_13.30.35.zip kubeadm.log

What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.

References like https://github.com/kubernetes/kubernetes/issues/48798#issuecomment-321267386 make me think that we are passing a CNI option whereas we can't have one because of the chicken-and-egg issue above. It looks like something is at odd w/ the CNI config, maybe at the cri-o level?

r10r commented 3 years ago

from crio.service.log

Feb 03 13:30:22 vagrant-k8s crio[279]: time="2021-02-03 13:30:22.282784924Z" level=error msg="Container creation error: open /etc/default/crio-lxc: no such file or directory\n"

The default crio-lxc configuration is missing. Simply doing a touch /etc/default/crio-lxc should be enough. See https://github.com/Drachenfels-GmbH/crio-lxc#environment-file. I'll create the file in the binary if it does not exist yet.

What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.

No I think it's not. It's ok to install the CNI plugin after kubeadm init. All pods initialized by kubeadm init are static pods with manifests in /etc/kubernetes/manifests/. They get the host network namespace (the cluster IP) and need no CNI. CNI is only required for coredns pods - but they don't block kubeadm init

vrubiolo commented 3 years ago

from crio.service.log

Feb 03 13:30:22 vagrant-k8s crio[279]: time="2021-02-03 13:30:22.282784924Z" level=error msg="Container creation error: open /etc/default/crio-lxc: no such file or directory\n"

The default crio-lxc configuration is missing. Simply doing a touch /etc/default/crio-lxc should be enough. See https://github.com/Drachenfels-GmbH/crio-lxc#environment-file.

Fantastic! This was the missing part (I had forgotten to re-read the main crio-lxc doc).

Now the cluster initializes :partying_face:

To start using your cluster, you need to run the following as a regular user:                                                                                                                                                                 

  mkdir -p $HOME/.kube                                                                                                                                                                                                                        
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config                                                                                                                                                                                    
  sudo chown $(id -u):$(id -g) $HOME/.kube/config                                                                                                                                                                                             

Alternatively, if you are the root user, you can run:                                                                                                                                                                                         

  export KUBECONFIG=/etc/kubernetes/admin.conf                                                                                                                                                                                                

You should now deploy a pod network to the cluster.                                                                                                                                                                                           
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:                                                                                                                                                                   
  https://kubernetes.io/docs/concepts/cluster-administration/addons/                                                                                                                                                                          

You can now join any number of control-plane nodes by copying certificate authorities                                                                                                                                                         
and service account keys on each node and then running the following as root:                                                                                                                                                                 

  kubeadm join 10.0.2.15:6443 --token 1l6ep6.gdufc3w4isp2z4f0 \                                                                                                                                                                               
    --discovery-token-ca-cert-hash sha256:fcf1fe7e5e3ce7c2fbb974727372fcd95038f6853d7d6711068b3f4218886341 \                                                                                                                                  
    --control-plane                                                                                                                                                                                                                           

Then you can join any number of worker nodes by running the following on each as root:                                                                                                                                                        

kubeadm join 10.0.2.15:6443 --token 1l6ep6.gdufc3w4isp2z4f0 \                                                                                                                                                                                 
    --discovery-token-ca-cert-hash sha256:fcf1fe7e5e3ce7c2fbb974727372fcd95038f6853d7d6711068b3f4218886341

I'll create the file in the binary if it does not exist yet.

Good idea. A mention in the K8S docs would be very useful too. I have copied the one from your example in my case.

Cluster logs, w/ kubeadm log included: 02.03_14.02.01.zip

What I don't get is about the CNI errors (I think this is the root cause of the issue): the CNI addon is supposed to be installed via kubectl but this supposes that the master is up. Here, this is clearly not the case. So kind of a chicken and egg issue.

No I think it's not. It's ok to install the CNI plugin after kubeadm init. All pods initialized by kubeadm init are static pods with manifests in /etc/kubernetes/manifests/. They get the host network namespace (the cluster IP) and need no CNI. CNI is only required for coredns pods - but they don't block kubeadm init

Ok, that explains it. Thanks for pointing this out!

My next step will be to install the Cilium indeed as you pointed out and will report back.

Another question: for your pod container images, are you using the LXC OCI template or manually converting them via umoci?

r10r commented 3 years ago

Now the cluster initializes 🥳

Great!

My next step will be to install the Cilium indeed as you pointed out and will report back.

Well that's simple now :D

kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml

Remember that you've a single node cluster and you have to untaint the control-plane in order to schedule pods. See https://github.com/Drachenfels-GmbH/crio-lxc/blob/dev/K8S.md#kubeadm-init

Another question: for your pod container images, are you using the LXC OCI template or manually converting them via umoci?

I use buildah. It's fairly simple to create new images. https://github.com/containers/buildah/blob/master/docs/tutorials/01-intro.md

r10r commented 3 years ago

Next step - Try it again on your favourite distribution :D

vrubiolo commented 3 years ago

Thanks for all the good info!

I have started to deploy the Cilium CRDs/pods but this fails right now:

[vagrant@vagrant-k8s logs]$ kubectl -n kube-system get pods --watch
NAME                                  READY   STATUS                      RESTARTS   AGE
cilium-operator-65c5fc987f-l656q      0/1     CrashLoopBackOff            6          16m
cilium-tgs6f                          0/1     Init:CreateContainerError   0          16m
coredns-74ff55c5b-k9r7h               0/1     Pending                     0          3h6m
coredns-74ff55c5b-ks297               0/1     Pending                     0          3h6m
etcd-vagrant-k8s                      1/1     Running                     0          3h6m
kube-apiserver-vagrant-k8s            1/1     Running                     0          3h6m
kube-controller-manager-vagrant-k8s   1/1     Running                     0          3h6m
kube-proxy-4wfjm                      0/1     CreateContainerError        0          3h6m
kube-scheduler-vagrant-k8s            1/1     Running                     0          3h6m

It looks like an issue w/ crio-lxc:

kubectl describe pods cilium-tgs6f -n kube-system
[...]
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned kube-system/cilium-tgs6f to vagrant-k8s
  Normal   Pulling    17m                   kubelet            Pulling image "quay.io/cilium/cilium:v1.9.3"
  Normal   Pulled     16m                   kubelet            Successfully pulled image "quay.io/cilium/cilium:v1.9.3" in 27.862248371s
  Normal   Pulled     7m11s (x44 over 16m)  kubelet            Container image "quay.io/cilium/cilium:v1.9.3" already present on machine
  Warning  Failed     2m20s (x67 over 16m)  kubelet            Error: container create failed: [crio-lxc-start] failed to start container

I see the following in the crio-lxc log:

lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG    conf - conf.c:dropcaps_except:2453 - Keep capability wake_alarm (35)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG    conf - conf.c:dropcaps_except:2453 - Keep capability block_suspend (36)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 DEBUG    conf - conf.c:dropcaps_except:2453 - Keep capability audit_read (37)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR    conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR    conf - conf.c:lxc_setup:3437 - Failed to keep capabilities
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR    start - start.c:do_start:1267 - Failed to setup container "a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac"
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR    sync - sync.c:__sync_wait:36 - An error occurred in another process (expected sequence number 5)
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR    start - start.c:__lxc_start:2082 - Failed to spawn container "a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac"
lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 WARN     start - start.c:lxc_abort:1012 - No such process - Failed to send SIGKILL via pidfd 16 for process 1766724
{"l":"warn","cmd":"create","cid":"a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac","pid":1766722,"status":"exit status 1","t":"20210203170734.717","c":"create.go:105","m":"start process terminated"}

Does that ring a bell to you?

As for the images, thanks for the buildah pointer. I have some existing LXC images so I was thinking of using them as-is (or via just a simple OCI conversion). You seemed to be in the same case (migrating an existing codebase to k8s). I understand you are rebuilding your containers from scratch as opposed to using existing LXC images, is that right?

cilium-pod-error.txt 02.03_17.04.23.zip

r10r commented 3 years ago

As for the images, thanks for the buildah pointer. I have some existing LXC images so I was thinking of using them as-is (or via just a simple OCI conversion). You seemed to be in the same case (migrating an existing codebase to k8s). I understand you are rebuilding your containers from scratch as opposed to using existing LXC images, is that right?

You can use buildah to migrate your LXC images too. Simply create an image from scratch and copy over the rootfs.

r10r commented 3 years ago

lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon

You can disable capabilities for now. Simply set CRIO_LXC_CAPABILITIES=false in /etc/default/crio-lxc

But, we have to find out why CAP_PERFMON is not unknown ( Maybe libcap is outdated )

From man 7 capabilities CAP_PERFMON (since Linux 5.8) ...

Unfortunately the crio-lxc.tar is empty, so no crio-lxc logfile gather-logs.sh, seems you did change the path to the crio-lxc logfile so gather-logs.sh couldn't pick it up.

I need the container runtime logs. So please use the CRIO_LXC_CREATE_HOOK to create backups from the container runtime configurations. See https://github.com/Drachenfels-GmbH/crio-lxc#create-hook for details.

Please do the following steps.

Apply the crio-lxc configuration below
Delete the cilium pods kubeadm delete -f ...
Create the cilium pods again.
Run gather-logs.sh
Attach the output form gather-logs.sh

[root@k8s-cluster2-controller crio-lxc]# cat /etc/default/crio-lxc
CRIO_LXC_LOG_LEVEL=debug
CRIO_LXC_CONTAINER_LOG_LEVEL=debug
CRIO_LXC_CREATE_HOOK=/usr/local/bin/crio-lxc-backup.sh

[root@k8s-cluster2-controller crio-lxc]# cat /usr/local/bin/crio-lxc-backup.sh
#!/bin/sh

LOGDIR=$(dirname $LOG_FILE)

# backup container runtime directory to log directory
cp -r $RUNTIME_PATH $LOGDIR/$CONTAINER_ID
# copy OCI runtime spec to container runtime directory
cp $SPEC_PATH $LOGDIR/$CONTAINER_ID/spec.json

r10r commented 3 years ago

Hmm seems similar to https://github.com/cri-o/cri-o/issues/4478

r10r commented 3 years ago

Please attach the output of pacman -Qe

vrubiolo commented 3 years ago

lxc a2a77dd2dd480f59725c5db1e84cdb657fc68472d3ee1f4ae7b098f8ba25c1ac 20210203170734.713 ERROR conf - conf.c:dropcaps_except:2451 - Unknown capability perfmon

You can disable capabilities for now. Simply set CRIO_LXC_CAPABILITIES=false in /etc/default/crio-lxc

Thanks. I will try this out.

But, we have to find out why CAP_PERFMON is not unknown ( Maybe libcap is outdated )

From man 7 capabilities CAP_PERFMON (since Linux 5.8) ...

Unfortunately the crio-lxc.tar is empty, so no crio-lxc logfile gather-logs.sh, seems you did change the path to the crio-lxc logfile so gather-logs.sh couldn't pick it up.

I had not changed the path, it's just that I am not running as root and I forgot to update the tar command so that it could access this directory. I have fixed that.

I need the container runtime logs. So please use the CRIO_LXC_CREATE_HOOK to create backups from the container runtime configurations. See https://github.com/Drachenfels-GmbH/crio-lxc#create-hook for details.

Please do the following steps.
* Apply the crio-lxc configuration below

* Delete the cilium pods `kubeadm delete -f ...`

* Create the cilium pods again.

* Run `gather-logs.sh`

* Attach the output form `gather-logs.sh`

All done (let me know if there is an easy way to reduce the log archive size, I had run clear-logs.sh just before recreating the cilium pods) : 02.03_20.43.34.zip

W.r.t pods, Cilium creates DaemonSets and ReplicaSets so I removed those as well, not just the pods.

As for pacman, here is the info: pacman.txt

r10r commented 3 years ago

hmm, spec.json files are missing in the output. Did you make /usr/local/bin/crio-lxc-backup.sh executable ?

r10r commented 3 years ago

no you didn't :D

{"l":"error","cmd":"create","cid":"f7acd03ba8b6368667d1dbd30f76859aa7aa9ddbf28992cf6170bee63a7a7a73","error":"fork/exec /usr/local/bin/crio-lxc-backup.sh: permission denied","file":"/usr/local/bin/crio-lxc-backup.sh","t":"20210203201119.112","c":"cli.go:308","m":"failed to execute create hook"}

vrubiolo commented 3 years ago

Ok, let me do that again :)

Edit: here it is: 02.03_21.29.32.zip (took me time to check spec.json were there, hopefully you have everything)

r10r commented 3 years ago

Thanks.

I'll upgrade liblxc tomorrow to support new all newer capabilities e.g CAP_PERFMON https://github.com/lxc/lxc/commit/7b4cd4681da399acc1775773d7967a3c94635346

You then have to rebuild liblxc. After that you can comment CRIO_LXC_CAPABILITIES=false again. Did it work for you with CRIO_LXC_CAPABILITIES=false set ?

vrubiolo commented 3 years ago

Thanks.

I'll upgrade liblxc tomorrow to support new all newer capabilities e.g CAP_PERFMON lxc/lxc@7b4cd46

Excellent !

You then have to rebuild liblxc. After that you can comment CRIO_LXC_CAPABILITIES=false again.

Will do.

Did it work for you with CRIO_LXC_CAPABILITIES=false set ?

It looks like so, as the cilium pod is now running:

NAMESPACE     NAME                                  READY   STATUS             RESTARTS   AGE
kube-system   cilium-7d4zc                          1/1     Running            0          5m42s
kube-system   cilium-operator-696dc48d8d-tknls      0/1     ImagePullBackOff   0          5m42s
kube-system   coredns-74ff55c5b-k9r7h               0/1     CrashLoopBackOff   25         7h46m
kube-system   coredns-74ff55c5b-ks297               0/1     CrashLoopBackOff   25         7h46m
kube-system   etcd-vagrant-k8s                      1/1     Running            0          7h46m
kube-system   kube-apiserver-vagrant-k8s            1/1     Running            0          7h46m
kube-system   kube-controller-manager-vagrant-k8s   1/1     Running            0          7h46m
kube-system   kube-proxy-4wfjm                      1/1     Running            0          7h46m
kube-system   kube-scheduler-vagrant-k8s            1/1     Running            0          7h46m

Not sure about its operator though and why it is in ImagePullBackOff nor why coredns is misbehaving. Attached are the logs, again: 02.03_21.46.30.zip

vrubiolo commented 3 years ago

Running a simple image appear to work (I like to use https://github.com/kubernetes-up-and-running/kuard):

[vagrant@vagrant-k8s ~]$ kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
[vagrant@vagrant-k8s ~]$ k get pods
NAME    READY   STATUS    RESTARTS   AGE
kuard   1/1     Running   0          7m2s

[vagrant@vagrant-k8s ~]$ k describe pod kuard 
Name:         kuard
Namespace:    default
Priority:     0
Node:         vagrant-k8s/10.0.2.15
Start Time:   Wed, 03 Feb 2021 21:53:05 +0000
Labels:       run=kuard
Annotations:  <none>
Status:       Running
IP:           10.0.0.135
IPs:
  IP:  10.0.0.135
Containers:
  kuard:
    Container ID:   cri-o://059c7537c1fe16cc6dbae245652d0fe7ea33a22170596c2e7660c98d30e970b4
    Image:          gcr.io/kuar-demo/kuard-amd64:blue
    Image ID:       gcr.io/kuar-demo/kuard-amd64@sha256:1ecc9fb2c871302fdb57a25e0c076311b7b352b0a9246d442940ca8fb4efe229
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 03 Feb 2021 21:53:09 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bz5lh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-bz5lh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bz5lh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  3m38s (x4 over 5m56s)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Normal   Scheduled         3m25s                  default-scheduler  Successfully assigned default/kuard to vagrant-k8s
  Normal   Pulling           3m25s                  kubelet            Pulling image "gcr.io/kuar-demo/kuard-amd64:blue"
  Normal   Pulled            3m21s                  kubelet            Successfully pulled image "gcr.io/kuar-demo/kuard-amd64:blue" in 3.799843248s
  Normal   Created           3m21s                  kubelet            Created container kuard
  Normal   Started           3m21s                  kubelet            Started container kuard

But it looks like networking might be missing something as I cannot get something on my host when kube-proxy runs, cf the test on the kuard page:

kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
kubectl port-forward kuard 8080:8080

I cannot reach anything after that on https://localhost:8080

r10r commented 3 years ago

Is cillium healthy ?

r10r commented 3 years ago

I noticed an issue with cillium 1.9.4 please try 1.9.3 (deployment file is attached)

  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m51s                  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
  Warning  FailedScheduling  5m51s                  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
  Normal   Scheduled         5m48s                  default-scheduler  Successfully assigned kube-system/cilium-operator-696dc48d8d-k4mxq to k8s-cluster2-controller
  Normal   Pulling           3m19s (x4 over 5m48s)  kubelet            Pulling image "quay.io/cilium/operator-generic:v1.9.4"
  Warning  Failed            3m17s (x4 over 5m31s)  kubelet            Failed to pull image "quay.io/cilium/operator-generic:v1.9.4": rpc error: code = Unknown desc = Error reading manifest v1.9.4 in quay.io/cilium/operator-generic: manifest unknown: manifest unknown
  Warning  Failed            3m17s (x4 over 5m31s)  kubelet            Error: ErrImagePull
  Normal   BackOff           3m5s (x6 over 5m30s)   kubelet            Back-off pulling image "quay.io/cilium/operator-generic:v1.9.4"
  Warning  Failed            42s (x16 over 5m30s)   kubelet            Error: ImagePullBackOff

cillium-1.9.3.yaml.txt

vrubiolo commented 3 years ago

I noticed an issue with cillium 1.9.4 please try 1.9.3 (deployment file is attached)

  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m51s                  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
  Warning  FailedScheduling  5m51s                  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod anti-affinity rules.
  Normal   Scheduled         5m48s                  default-scheduler  Successfully assigned kube-system/cilium-operator-696dc48d8d-k4mxq to k8s-cluster2-controller
  Normal   Pulling           3m19s (x4 over 5m48s)  kubelet            Pulling image "quay.io/cilium/operator-generic:v1.9.4"
  Warning  Failed            3m17s (x4 over 5m31s)  kubelet            Failed to pull image "quay.io/cilium/operator-generic:v1.9.4": rpc error: code = Unknown desc = Error reading manifest v1.9.4 in quay.io/cilium/operator-generic: manifest unknown: manifest unknown
  Warning  Failed            3m17s (x4 over 5m31s)  kubelet            Error: ErrImagePull
  Normal   BackOff           3m5s (x6 over 5m30s)   kubelet            Back-off pulling image "quay.io/cilium/operator-generic:v1.9.4"
  Warning  Failed            42s (x16 over 5m30s)   kubelet            Error: ImagePullBackOff

cillium-1.9.3.yaml.txt

Yup, much better:

[vagrant@vagrant-k8s ~]$ k get pods -A
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
kube-system   cilium-66pbl                          1/1     Running   0          2m57s
kube-system   cilium-operator-65c5fc987f-79hxb      1/1     Running   0          2m57s
kube-system   coredns-74ff55c5b-k9r7h               0/1     Running   32         8h
kube-system   coredns-74ff55c5b-ks297               0/1     Running   32         8h
kube-system   etcd-vagrant-k8s                      1/1     Running   0          8h
kube-system   kube-apiserver-vagrant-k8s            1/1     Running   0          8h
kube-system   kube-controller-manager-vagrant-k8s   1/1     Running   0          8h
kube-system   kube-proxy-4wfjm                      1/1     Running   0          8h
kube-system   kube-scheduler-vagrant-k8s            1/1     Running   0          8h

although coredns does not appear to be able to be ready nor has the situation improved w.r.t my kuard test app ... 02.03_22.19.34.zip

r10r commented 3 years ago

Delete the coredns pods - they will be added back automatically. I suspect they have a wrong IP since cilium was not healthy.

vrubiolo commented 3 years ago

Ok, thanks for the suggestion, this seemed to help.

[vagrant@vagrant-k8s logs]$ k get pods
NAME                                  READY   STATUS    RESTARTS   AGE
cilium-66pbl                          1/1     Running   0          11h
cilium-operator-65c5fc987f-79hxb      1/1     Running   0          11h
coredns-74ff55c5b-gh79t               0/1     Running   50         3h26m
coredns-74ff55c5b-qqplv               1/1     Running   0          3h26m
etcd-vagrant-k8s                      1/1     Running   0          19h
kube-apiserver-vagrant-k8s            1/1     Running   0          19h
kube-controller-manager-vagrant-k8s   1/1     Running   0          19h
kube-proxy-4wfjm                      1/1     Running   0          19h
kube-scheduler-vagrant-k8s            1/1     Running   0          19h

One of the coredns pods never seem to get ready though:

[vagrant@vagrant-k8s logs]$ k describe pods coredns-74ff55c5b-svd7z 
Name:                 coredns-74ff55c5b-svd7z
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 vagrant-k8s/10.0.2.15
Start Time:           Thu, 04 Feb 2021 09:32:00 +0000
Labels:               k8s-app=kube-dns                                                                                 
                      pod-template-hash=74ff55c5b                                                                      
Annotations:          <none>
Status:               Running
IP:                   10.0.0.213          
IPs:                                                                                                                   
  IP:           10.0.0.213
Controlled By:  ReplicaSet/coredns-74ff55c5b
Containers:              
  coredns:                
    Container ID:  cri-o://f05759261ce5e7359433cac67bdbaaffd5a37a1112f2c9ba3fed43ab7f6ff183
    Image:         k8s.gcr.io/coredns:1.7.0
    Image ID:      k8s.gcr.io/coredns@sha256:242d440e3192ffbcecd40e9536891f4d9be46a650363f3a004497c2070f96f5a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP                                                                                 
    Args:             
      -conf         
      /etc/coredns/Corefile
    State:          Running                             
      Started:      Thu, 04 Feb 2021 09:32:01 +0000
    Ready:          False
    Restart Count:  0     
    Limits:                            
      memory:  170Mi                         
    Requests:                                                                                                          
      cpu:        100m                                    
      memory:     70Mi                                                                                                 
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>                                                                                               
    Mounts:                                                                                                            
      /etc/coredns from config-volume (ro)                                                                                                                                                                                                    
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-fcqq4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-fcqq4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-fcqq4
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly op=Exists
                 node-role.kubernetes.io/control-plane:NoSchedule
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  58s                default-scheduler  Successfully assigned kube-system/coredns-74ff55c5b-svd7z to vagrant-k8s
  Normal   Pulled     57s                kubelet            Container image "k8s.gcr.io/coredns:1.7.0" already present on machine
  Normal   Created    57s                kubelet            Created container coredns
  Normal   Started    57s                kubelet            Started container coredns
  Warning  Unhealthy  13s (x3 over 43s)  kubelet            Readiness probe failed: Get "http://10.0.0.213:8181/ready": dial tcp 10.0.0.213:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  3s (x3 over 53s)   kubelet            Readiness probe failed: Get "http://10.0.0.213:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Also the situation is actually better than I reported w.r.t kuard (I was missing one level of indirection and did not forward the ports from my VM). I can reach port 8080 on localhost when kube-proxy is forwarding my port:

[vagrant@vagrant-k8s logs]$ k logs kuard
2021/02/04 09:34:28 Starting kuard version: v0.10.0-blue
2021/02/04 09:34:28 **********************************************************************
2021/02/04 09:34:28 * WARNING: This server may expose sensitive
2021/02/04 09:34:28 * and secret information. Be careful.
2021/02/04 09:34:28 **********************************************************************
2021/02/04 09:34:28 Config: 
{
  "address": ":8080",
  "debug": false,
  "debug-sitedata-dir": "./sitedata",
  "keygen": {
    "enable": false,
    "exit-code": 0,
    "exit-on-complete": false,
    "memq-queue": "",
    "memq-server": "",
    "num-to-gen": 0,
    "time-to-run": 0
  },
  "liveness": {
    "fail-next": 0
  },
  "readiness": {
    "fail-next": 0
  },
  "tls-address": ":8443",
  "tls-dir": "/tls"
}
2021/02/04 09:34:28 Could not find certificates to serve TLS
2021/02/04 09:34:28 Serving on HTTP on :8080
[vagrant@vagrant-k8s logs]$ kubectl port-forward kuard 8080:8080                                                                                                                                                                              
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Handling connection for 8080
E0204 09:35:17.295472 1893851 portforward.go:385] error copying from local connection to remote stream: read tcp6 [::1]:8080->[::1]:41432: read: connection reset by peer
Handling connection for 8080

[vagrant@vagrant-k8s ~]$ curl http://localhost:8080
<!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">

  <title>KUAR Demo</title>

  <link rel="stylesheet" href="/static/css/bootstrap.min.css">
  <link rel="stylesheet" href="/static/css/styles.css">

  <script>
var pageContext = {"urlBase":"","hostname":"kuard","addrs":["10.0.0.32"],"version":"v0.10.0-blue","versionColor":"hsl(339,100%,50%)","requestDump":"GET / HTTP/1.1\r\nHost: localhost:8080\r\nAccept: */*\r\nUser-Agent: curl/7.74.0","requestProto":"HTTP/1.1","requestAddr":"127.0.0.1:49912"}
  </script>
</head>

<svg style="position: absolute; width: 0; height: 0; overflow: hidden;" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<symbol id="icon-power" viewBox="0 0 32 32">
<title>power</title>
<path class="path1" d="M12 0l-12 16h12l-8 16 28-20h-16l12-12z"></path>
</symbol>
<symbol id="icon-notification" viewBox="0 0 32 32">
<title>notification</title>
<path class="path1" d="M16 3c-3.472 0-6.737 1.352-9.192 3.808s-3.808 5.72-3.808 9.192c0 3.472 1.352 6.737 3.808 9.192s5.72 3.808 9.192 3.808c3.472 0 6.737-1.352 9.192-3.808s3.808-5.72 3.808-9.192c0-3.472-1.352-6.737-3.808-9.192s-5.72-3.808-9.192-3.808zM16 0v0c8.837 0 16 7.163 16 16s-7.163 16-16 16c-8.837 0-16-7.163-16-16s7.163-16 16-16zM14 22h4v4h-4zM14 6h4v12h-4z"></path>
</symbol>
</defs>
</svg>

<body>
  <div id="root"></div>
  <script src="/built/bundle.js" type="text/javascript"></script>
</body>
</html>

Here are some questions:

Would you have a suggestion or pointers on how to investigate the coredns issues?
Can I use the standard sonobuoy (which you seemed to be using to assess the cluster state/health/config) instructions to validate my cluster state?