Closed mslga closed 10 months ago
This issue is currently awaiting triage.
If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Is this a reproducible issue or was this a one-time hit?
Also: looks like its not specific to CAPI (maybe even not to CAPV), but more a bug in the used image which comes back to image-builder.
I ran additional tests on vSphere cluster "A" (the one that originally encountered the error). I used different OVAs that I downloaded from https://github.com/kubernetes-sigs/cluster-api-provider-vsphere?tab=readme-ov-file#kubernetes-versions-with-published-ovas
ubuntu-2204-kube-v1.27.3 ubuntu-2204-kube-v1.28.0
In both cases this error was present. And this error is only related to the api-server pod. If you continue to install the necessary conrollers and plugins in the cluster, this error does not occur with other pods
I went ahead and uploaded these images to another vSphere cluster "B". And there this error was not present. When creating a CP node api-server pod did not even recreate:
crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
1320f4110e7f2 73deb9a3f7025 2 seconds ago Running etcd 1 89d26eb0056c8 etcd-kiv-cp-fsvvd
61d7e97aeb35d ed5bba5d71b95 44 seconds ago Running kube-vip 1 3b82d5b60e1d9 kube-vip-kiv-cp-fsvvd
5aaa71e16620b 4be79c38a4bab 56 seconds ago Running kube-controller-manager 0 1cd72b77449dc kube-controller-manager-kiv-cp-fsvvd
5c99d55d8feb7 f6f496300a2ae 56 seconds ago Running kube-scheduler 0 46ff881f080d7 kube-scheduler-kiv-cp-fsvvd
f8524cc86d0aa 73deb9a3f7025 56 seconds ago Exited etcd 0 89d26eb0056c8 etcd-kiv-cp-fsvvd
97475565488d5 ea1030da44aa1 About a minute ago Running kube-proxy 0 13191d32e2c8c kube-proxy-8rnkk
f72d4a681a5cb ed5bba5d71b95 About a minute ago Exited kube-vip 0 3b82d5b60e1d9 kube-vip-kiv-cp-fsvvd
001dd5a308c9f bb5e0dde9054c About a minute ago Running kube-apiserver 0 381cd2a560ab0 kube-apiserver-kiv-cp-fsvvd
3ed3ce6566fb3 f6f496300a2ae 2 minutes ago Exited kube-scheduler 0 ac1ae0d29ea97 kube-scheduler-kiv-cp-fsvvd
It turns out that the problem may be related to the vSphere cluster, but I don't quite understand what exactly to check and where to look for the problem
There is no load on vSphere cluster "A", it is a new cluster
vSphere cluster "A" version: 8.0.1.00200 Build number: 21860503
vSphere cluster "B" version: 8.0.1.00200 Build number: 21860503
Problem solved.
It turned out that NTP was not configured on cluster "A" on the esxi hosts. Virtual machines received time settings from esxi hosts when they were created. And then the cluster api set the NTP settings on the virtual machines. So the API-server container was rebooted because of the received time offset.
What steps did you take and what happened?
After deploy cluster with cluster-api vsphere kube-apiserver pod stuck in "CreateContainerError" status
kubectl get po -A
Api server endpoint is available and there are two kube-apiserver containers on the CP node: one in Running status and one in Exited status
crictl ps -a
Containerd logs
Kubelet logs
What did you expect to happen?
I assume that in normal behavior the container should be deleted, but it's like it's stuck in status Exited because a new container has already been created that uses the same podSandbox
I found just workaround: ssh to node and delete the container with Exited status
or add to postKubeadmCommands
Could this be a bug or is there already a solution to this problem?
Cluster API version
v1.5.3
Kubernetes version
v1.28.0
Anything else you would like to add?
CAPV v1.8.4
Label(s) to be applied
/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.