1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.27.0 (git-v1.27.0)
**2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
/ # kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.4
Kustomize Version: v5.0.1
Example output of
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
****** Ready control-plane,master 6d20h v1.27.4 ***** <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.6.6
****** Ready node 6d20h v1.27.4 ***** <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.6.6
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops edit cluster --name <cluster-name>
updated :
**kubernetesVersion: 1.29.5**
Added this under spec:
**nodeTerminationHandler:
enableRebalanceMonitoring: false
enableSQSTerminationDraining: false**
kops30 update cluster <cluster-name> --v 10 --yes
5. What happened after the commands executed?
After the successful upgrade , terminated the nodes .
All the nodes joined back the cluster but all of them are in NOT READY STATE.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
XXXX NotReady node 18h v1.29.5 10.35.27.251 <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.7.16
XXXX NotReady node 18h v1.29.5 10.35.24.146 <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.7.16
XXXX NotReady node 18h v1.29.5 10.35.27.168 <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.7.16
XXXX NotReady control-plane,master 18h v1.29.5 10.35.27.242 <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.7.16
XXXX NotReady node 18h v1.29.5 10.35.24.170 <none> Amazon Linux 2 5.10.217-205.860.amzn2.x86_64 containerd://1.7.16
snapshot from kubectl describe master node
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 26 Jun 2024 16:44:29 -0400 Tue, 25 Jun 2024 22:31:52 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 26 Jun 2024 16:44:29 -0400 Tue, 25 Jun 2024 22:31:52 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 26 Jun 2024 16:44:29 -0400 Tue, 25 Jun 2024 22:31:52 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 26 Jun 2024 16:44:29 -0400 Tue, 25 Jun 2024 22:31:52 -0400 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
logs from the the calico-node pod from container install-cni
kubectl logs calico-node-4sjb2 -n kube-system -c install-cni
2024-06-26 20:47:53.780 [INFO][1] cni-installer/<nil> <nil>: Running as a Kubernetes pod
2024-06-26 20:47:53.784 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/bandwidth"
2024-06-26 20:47:53.784 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/bandwidth
2024-06-26 20:47:53.830 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico"
2024-06-26 20:47:53.830 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico
2024-06-26 20:47:53.873 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico-ipam"
2024-06-26 20:47:53.873 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico-ipam
2024-06-26 20:47:53.875 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/flannel"
2024-06-26 20:47:53.875 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/flannel
2024-06-26 20:47:53.877 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/host-local"
2024-06-26 20:47:53.877 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/host-local
2024-06-26 20:47:53.880 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/loopback"
2024-06-26 20:47:53.880 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/loopback
2024-06-26 20:47:53.883 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/portmap"
2024-06-26 20:47:53.883 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/portmap
2024-06-26 20:47:53.886 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/tuning"
2024-06-26 20:47:53.886 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/tuning
2024-06-26 20:47:53.886 [INFO][1] cni-installer/<nil> <nil>: Wrote Calico CNI binaries to /host/opt/cni/bin
2024-06-26 20:47:53.932 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.27.3
2024-06-26 20:47:53.932 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
2024-06-26 20:47:53.932 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-06-26 20:47:53.937 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://172.21.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": tls: failed to verify certificate: x509: certificate is valid for 100.64.0.1, 127.0.0.1, not 172.21.0.1
2024-06-26 20:47:53.937 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://172.21.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": tls: failed to verify certificate: x509: certificate is valid for 100.64.0.1, 127.0.0.1, not 172.21.0.1
6. What did you expect to happen?
Expected to upgrade to latest version.
we also see that containerd upgraded to 1.7.16 and calico to 3.27.3 both are managed by kops directly.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Jun 26 02:38:31 ip-10-35-27-242.ec2.internal kubelet[4047]: E0626 02:38:31.314599 4047 kubelet.go:2892] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Also noticed that on the nodes in the path /etc/cni/net.d/ , not seeing below files
10-calico.conflist
calico-kubeconfig
*Able to see this files when we created the new updated cluster .
Other things :
Our Environment is private, which doesn't have internet access, but able to bring up the new cluster with kops v1.29.0 and kubernetes 1.29.5 , we are seeing the problem only when we are upgrading the existing cluster .
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information.**2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified asExample output of
3. What cloud provider are you using?
4. What commands did you run? What is the simplest way to reproduce this issue?
5. What happened after the commands executed? After the successful upgrade , terminated the nodes . All the nodes joined back the cluster but all of them are in NOT READY STATE.
6. What did you expect to happen?
Expected to upgrade to latest version. we also see that containerd upgraded to 1.7.16 and calico to 3.27.3 both are managed by kops directly.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know?
Some other logs from journalctl -u kubelet
Kubeproxy Logs
Also noticed that on the nodes in the path /etc/cni/net.d/ , not seeing below files 10-calico.conflist calico-kubeconfig
*Able to see this files when we created the new updated cluster .
Other things : Our Environment is private, which doesn't have internet access, but able to bring up the new cluster with kops v1.29.0 and kubernetes 1.29.5 , we are seeing the problem only when we are upgrading the existing cluster .