Open mvrk69 opened 7 months ago
From the logs, it seems that the weave initialisation procedure has not been able to write a files called /etc/cni/net.d/10-weave.conflist
, and therefore the CNI plugin is not enabled. Probably a permissions issue, especially because this is CoreOS. This should affect all versions of Kubernetes, not just later ones.
To confirm, could you please do the following for me?
$ kubectl logs -n kube-system <weave-net-pod> init
{
"cniVersion": "1.0.0",
"name": "weave",
"disableCheck": true,
"plugins": [
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}
1:
kubectl logs -n kube-system weave-net-gftst init error: container init is not valid for pod weave-net-gftst
The container name seems to be weave-init not init, but no logs, returns empty: kubectl logs -n kube-system weave-net-gftst weave-init
2: I see the file there:
root@k8sm01:~# ll /etc/cni/net.d/
total 8
-rw-r--r--. 1 root root 344 Apr 15 16:06 10-weave.conflist
-rw-r--r--. 1 root root 393 Apr 15 16:05 11-crio-ipv4-bridge.conflist
And the file seems to have the same contents you posted:
root@k8sm01:~# cat /etc/cni/net.d/10-weave.conflist
{
"cniVersion": "1.0.0",
"name": "weave",
"disableCheck": true,
"plugins": [
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
}
3: yes, i used kubeadm:
kubeadm init --config kubeadm-config.yml --upload-certs
kubeadm-config.yml:
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.29.3
networking:
podSubnet: "10.32.0.0/16"
serviceSubnet: "172.16.16.0/22"
controlPlaneEndpoint: k8sm01.azar.pt:6443
controllerManager:
extraArgs:
flex-volume-plugin-dir: "/etc/kubernetes/kubelet-plugins/volume/exec"
node-cidr-mask-size: "20"
allocate-node-cidrs: "true"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///var/run/crio/crio.sock
imagePullPolicy: "IfNotPresent"
kubeletExtraArgs:
cgroup-driver: "systemd"
resolv-conf: "/run/systemd/resolve/resolv.conf"
max-pods: "4096"
max-open-files: "20000000"
Sorry, the container name was indeed weave-init
. Okay. So the weave initialisation completed without issues, and the weave pod is also in running state and producing logs. At this point, weave is ready, and the NotReady taint should automatically be removed by the kubelet. Also, the coredns pods should go into ContainerCreating state.
Can we check the output of kubectl get pods -n kube-system -o wide
and kubectl describe node k8sm01
? Also, just to re-check the permissions problem, ls -l /etc/cni/net.d
(it should have 744 permissions).
root@k8sm01:~# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-76f75df574-8stlr 0/1 Pending 0 61s <none> <none> <none> <none>
coredns-76f75df574-d64zf 0/1 Pending 0 61s <none> <none> <none> <none>
etcd-k8sm01 1/1 Running 0 76s 192.168.0.115 k8sm01 <none> <none>
kube-apiserver-k8sm01 1/1 Running 0 77s 192.168.0.115 k8sm01 <none> <none>
kube-controller-manager-k8sm01 1/1 Running 0 78s 192.168.0.115 k8sm01 <none> <none>
kube-proxy-xffc2 1/1 Running 0 61s 192.168.0.115 k8sm01 <none> <none>
kube-scheduler-k8sm01 1/1 Running 0 77s 192.168.0.115 k8sm01 <none> <none>
metrics-server-84989b68d9-w8fhf 0/1 Pending 0 61s <none> <none> <none> <none>
weave-net-bdjmv 2/2 Running 1 (54s ago) 61s 192.168.0.115 k8sm01 <none> <none>
root@k8sm01:~# kubectl describe node k8sm01
Name: k8sm01
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8sm01
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/crio/crio.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 15 Apr 2024 16:53:35 +0200
Taints: node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: k8sm01
AcquireTime: <unset>
RenewTime: Mon, 15 Apr 2024 16:55:31 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 15 Apr 2024 16:54:08 +0200 Mon, 15 Apr 2024 16:54:08 +0200 WeaveIsUp Weave pod has set this
MemoryPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?
Addresses:
InternalIP: 192.168.0.115
Hostname: k8sm01
Capacity:
cpu: 4
ephemeral-storage: 8846316Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8121560Ki
pods: 4096
Allocatable:
cpu: 4
ephemeral-storage: 8152764813
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8019160Ki
pods: 4096
System Info:
Machine ID: 9d0900841b2d4a5c9d930f8e1c805869
System UUID: 9d090084-1b2d-4a5c-9d93-0f8e1c805869
Boot ID: 974b3842-0ba5-4aa4-bddf-6e4487920fe7
Kernel Version: 6.7.7-200.fc39.x86_64
OS Image: Fedora CoreOS 39.20240309.3.0
Operating System: linux
Architecture: amd64
Container Runtime Version: cri-o://1.29.2
Kubelet Version: v1.29.3
Kube-Proxy Version: v1.29.3
PodCIDR: 10.32.0.0/20
PodCIDRs: 10.32.0.0/20
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-k8sm01 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 2m
kube-system kube-apiserver-k8sm01 250m (6%) 0 (0%) 0 (0%) 0 (0%) 2m1s
kube-system kube-controller-manager-k8sm01 200m (5%) 0 (0%) 0 (0%) 0 (0%) 2m2s
kube-system kube-proxy-xffc2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 105s
kube-system kube-scheduler-k8sm01 100m (2%) 0 (0%) 0 (0%) 0 (0%) 2m1s
kube-system weave-net-bdjmv 100m (2%) 0 (0%) 0 (0%) 0 (0%) 105s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (18%) 0 (0%)
memory 100Mi (1%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 104s kube-proxy
Normal Starting 2m6s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 2m6s (x8 over 2m6s) kubelet Node k8sm01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m6s (x8 over 2m6s) kubelet Node k8sm01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m6s (x7 over 2m6s) kubelet Node k8sm01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m6s kubelet Updated Node Allocatable limit across pods
Normal Starting 2m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 2m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 2m kubelet Node k8sm01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m kubelet Node k8sm01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m kubelet Node k8sm01 status is now: NodeHasSufficientPID
Normal RegisteredNode 106s node-controller Node k8sm01 event: Registered Node k8sm01 in Controller
root@k8sm01:~# ls -l /etc/cni/net.d
total 8
-rw-r--r--. 1 root root 344 Apr 15 16:54 10-weave.conflist
-rw-r--r--. 1 root root 393 Apr 15 16:52 11-crio-ipv4-bridge.conflist
Yes, weave net is up and running, but the taint is not gone. I think a kubelet restart or a node restart will solve the problem.
I currently test weave net with kubernetes versions 1.27 through 1.29, on clusters running debian linux on amd64 and arm64. It works in all those cases. I also use kubeadm, with settings pretty similar to yours. Perhaps I should add coreos to the test mix.
I tried restarting kubelet and also reboot, and still no go.
I apologise for inconvenience. I'll try and replicate your environment, and see if I can find the problem. The only thing out of place in your logs is that kubelet has still not detected the CNI setup, even though Weave Net has indicated that the network is available. You can see that in the Conditions:
section of the kubectl describe node
output. Quoting the relevant part below:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 15 Apr 2024 16:54:08 +0200 Mon, 15 Apr 2024 16:54:08 +0200 WeaveIsUp Weave pod has set this
MemoryPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Mon, 15 Apr 2024 16:54:10 +0200 Mon, 15 Apr 2024 16:53:35 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?
I tried setting up a cluster on CoreOS, and faced exactly the same problem. But after a long time (over 15 minutes), the node became Ready. This happened with a different CNI as well.
I wasn't able to diagnose what the problem was. Every log gave the expected responses: no errors, no warnings. Only kubelet kept logging that there were no configuration files in /etc/cni/net.d/
, but the files were very much there. After a long-ish time, suddenly kubelet reported that the node was now ready.
I have not observed this behaviour on other operating systems. I'm tempted to just blame CoreOS, because weave does everything it is supposed to do - and the same symptoms can be seen for at least one other CNI plugin. But I will observe some more, and report back here.
Yes, seems happens to flannel and weave on clusters above 1.27 on CoreOS, though works fine with calico.
Also with Openshift OVN-Kubernetes.
Hi,
I deployed a kubernetes 1.29 cluster and after deploying weave, the node is NotReady and the pods are still in pending. Same thing with kuberntes 1.28 Last version where it works fine is kubernetes 1.27
Anything else we need to know?
OS: Fedora CoreOS 39
Versions:
Logs:
Any idea what might be wrong?