Closed justinmurray closed 4 years ago
@justinmurray how many replicas are configured for the associated MachineDeployment?
Has a CNI provider been deployed to the workload cluster yet? The control plane will not pass readiness checks if the CNI provider is not running, which will block the creation of workers from the MachineDeployment.
Thank you @detiber : I used the command from the Quickstart Guide as follows, so I expected three VMs to be assigned for worker roles:clusterctl config cluster vsphere-quickstart \ --infrastructure vsphere \ --kubernetes-version v1.17.3 \ --control-plane-machine-count 1 \ --worker-machine-count 3 > cluster.yaml How would I check on your CNI question above? Would CNI appear as one of the pods in the Workload cluster? Here is what I get from "kubectl get pods -A" command:
[root@capvm1 .cluster-api]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-68dc4cf88f-5jpfs 1/1 Running 0 27h kube-system calico-node-szx24 1/1 Running 0 27h kube-system coredns-6955765f44-7vlv8 1/1 Running 0 27h kube-system coredns-6955765f44-8scgz 1/1 Running 0 27h kube-system etcd-vsphere-tkg-kx2mq 1/1 Running 0 27h kube-system kube-apiserver-vsphere-tkg-kx2mq 1/1 Running 0 27h kube-system kube-controller-manager-vsphere-tkg-kx2mq 1/1 Running 0 27h kube-system kube-proxy-l8tmq 0/1 ImagePullBackOff 0 27h kube-system kube-scheduler-vsphere-tkg-kx2mq 1/1 Running 0 27h kube-system vsphere-cloud-controller-manager-q5j7n 1/1 Running 0 27h kube-system vsphere-csi-controller-0 4/5 CrashLoopBackOff 473 27h kube-system vsphere-csi-node-mx5rj 3/3 Running 0 27h [root@capvm1 .cluster-api]#
Also seeing this error condition when I do a : kubectl describe pod cube-proxy-l8tmq -n Kube-system (I tried the docker command separately to see if I could get to the image in question): Events: Type Reason Age From Message
Normal BackOff 11m (x7269 over 27h) kubelet, vsphere-tkg-kx2mq Back-off pulling image "k8s.gcr.io/kube-proxy:1.17.3" Warning Failed 107s (x7313 over 27h) kubelet, vsphere-tkg-kx2mq Error: ImagePullBackOff [root@capvm1 .cluster-api]
Trying to pull repository k8s.gcr.io/kube-proxy ... Pulling repository k8s.gcr.io/kube-proxy unauthorized: authentication required
Hmm, that is odd, I would expect the kube-proxy image to have a tag of v1.17.3
not 1.17.3
.
@justinmurray what does kubectl get kubeadmcontrolplane -o yaml
show for your environment? Mostly curious about the value of Spec.Version
Here is the output for that command from the management cluster:
[root@capvm1 logs]# kubectl get kubeadmcontrolplane -o yaml apiVersion: v1 items:
Yes, I can manually docker pull the k8s.gcr.io/kube-proxy image when the "v" comes before the version, 1.17.3 but get the error when that v is missing. This I guess is built into the creation of the kind cluster somewhere, is that correct?
version: 1.17.3
should be version: v1.17.3
Where should that change be made?
when doing clusterctl config
you'd need to pass v1.17.3
as kubernetes version see: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/master/docs/getting_started.md#creating-a-vsphere-based-workload-cluster
Yes, I used the full clusterctl command with the parameters as specified there just now - and the worker VMs now appear in vCenter. Thank you.
There was an earlier instruction on the Cluster Api page that did not show the option to create the cluster.yaml file and instead piped the output from "clusterctl config cluster" straight into the kubectl apply -f - command. I think the latter method was not set up with the correct version prefix for tube-proxy and that caused the above issue. Thanks again.
The cluster is created now in the vSphere 6.7 U2 lab (Lab 1). However, I am still seeing issues in the vSphere 6.7 Update 3 lab (different hardware, different location, same Cluster Api release) where the CAPV controller pod within the capv-system namespace is producing these messages below and the creation of the cluster is going no further than the LB VM creation:
E0313 19:12:16.028134 1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="unexpected error while reconciling load balancer config for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=H AProxyLoadBalancer default/tkg1: failed to get hapi global config for infrastructure.cluster.x-k8s.io/v1alpha3, Kind=HAProxyLoadBalancer default/tkg1: Get https://10.196.180.156:5556/v1/services/haproxy/configuration/glob al: remote error: tls: bad certificate" "controller"="haproxyloadbalancer" "request"={"Namespace":"default","Name":"tkg1"} I0313 19:12:40.401764 1 vspherecluster_controller.go:219] capv-controller-manager/vspherecluster-controller/default/tkg1 "msg"="Reconciling VSphereCluster" I0313 19:12:40.414612 1 vspherecluster_controller.go:366] capv-controller-manager/vspherecluster-controller/default/tkg1 "msg"="status.ready not found for load balancer" "load-balancer-gvk"="infrastructure.cluster. x-k8s.io/v1alpha3, Kind=HAProxyLoadBalancer" "load-balancer-name"="tkg1" "load-balancer-namespace"="default" I0313 19:12:40.414632 1 vspherecluster_controller.go:230] capv-controller-manager/vspherecluster-controller/default/tkg1 "msg"="load balancer is not reconciled" I0313 19:12:40.414780 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspherecluster" "request"={"Namespace":"default","Name":"tkg1"} I0310 19:33:45.832802 1 main.go:209] Generating self signed cert as no cert is provided I0310 19:33:45.903220 1 main.go:242] Listening securely on 0.0.0.0:8443
@justinmurray - are you facing the same issue with a newer version of CAPV ?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
/kind bug
What steps did you take and what happened:
As a user I followed the guidelines at https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/master/docs/getting_started.md closely but when the workload cluster is created there are no VMs created for the remaining nodes apart from the initial LB and control plane VMs. I had gone through the entire cycle once before, I deleted the workload cluster using kubectl and then deleted the Management cluster using kind delete cluster - and started from the beginning to build each one in turn.
What did you expect to happen: Expected to see the LB and controlplane VMs being created, along with a set of 3 worker VMs. The 3 worker VMs never get created
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] When issuing the command on the management cluster (generated using Kind create cluster): kubectl logs capi-kubeadm-bootstrap-controller-manager-54bf6747bf-89n85 -n capi-kubeadm-bootstrap-system --all-containers |more I see an error with creation of a secret that is, I believe for the first of the new VMs: I0311 18:01:17.814415 1 kubeadmconfig_controller.go:298] controllers/KubeadmConfig "msg"="Creating BootstrapData for the init control plane" "kind"="Machi ne" "kubeadmconfig"={"Namespace":"default","Name":"vsphere-tkg-7sprc"} "name"="vsphere-tkg-kx2mq" "version"="17253" E0311 18:01:17.821438 1 kubeadmconfig_controller.go:374] controllers/KubeadmConfig "msg"="failed to store bootstrap data" "error"="failed to create bootst rap data secret for KubeadmConfig default/vsphere-tkg-7sprc: secrets \"vsphere-tkg-7sprc\" already exists" "kind"="Machine" "kubeadmconfig"= {"Namespace":"defaul t","Name":"vsphere-tkg-7sprc"} "name"="vsphere-tkg-kx2mq" "version"="17253" This is the first time I created a Workload cluster with the name "sphere-tkg" as a prefix from the management cluster so it is not likely that the secret mentioned was lying around from a previous cluster creation.
Set of log files attached from the cap* pods in the management cluster (kind cluster) logbundle-cap.tar.zip
Environment:
kubectl version
): 1.17.3/etc/os-release
): CentOS 7.7. jump box where I execute clusterctl, kubectl; Ubuntu 1804 from the Cluster Api page for the OS within the VMs