Closed andyzheung closed 1 year ago
other question is: the ova of this can't setup.
I follow this to: https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9
use CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0
I try this ova, that can setup the work cluster...
but a new problem is that only one controller plane node is ready.
kubectl logs -n capi-system capi-controller-manager-fbd594dc6-frfj8
Is this any logs need to check to find out the problem?
I use the template in this repo: demo-template.zip CONTROL_PLANE_MACHINE_COUNT='3' Is there any other things to consider?
I got into the not normal node, see the keubelt: journalctl -xefu kubelet Is there any problem about kubeadm at this environment.
I try to ssh to the controller plane node2, and kubeadm reset kubeadm join 10.250.71.221:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx --control-plane --certificate-key xxxxx
few minites later, I can see: but still not have node name: and I can't see the third controller node:
logs in capi-kubeadm-control-plane-system kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc
two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini
I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..
next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?
I see this issue, I seem need to deploy CAPD? what is? I only deploy CAPI and CAPV. https://github.com/kubernetes-sigs/cluster-api/issues/4027
kubectl get kubeadmcontrolplanes
kubectl describe kubeadmcontrolplanes
Name: autonomy-elastic-dev-cluster
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster
Annotations:
hostPath: path: /etc/kubernetes/admin.conf type: FileOrCreate name: kubeconfig status: {}
Owner: root:root Path: /etc/kubernetes/manifests/kube-vip.yaml Format: cloud-config Init Configuration: Local API Endpoint: Node Registration: Cri Socket: /var/run/containerd/containerd.sock Kubelet Extra Args: Cloud - Provider: external Name: {{ ds.meta_data.hostname }} Join Configuration: Discovery: Node Registration: Cri Socket: /var/run/containerd/containerd.sock Kubelet Extra Args: Cloud - Provider: external Name: {{ ds.meta_data.hostname }} Pre Kubeadm Commands: hostname "{{ ds.meta_data.hostname }}" echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts echo "127.0.0.1 localhost" >>/etc/hosts echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts echo "{{ ds.meta_data.hostname }}" >/etc/hostname Users: Name: capv Ssh Authorized Keys: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjXZum6TwE2qL5wWgp38YA51C2fyfFHYQR7+jFrxq9QW1k3KKIPIc1wA8yMhbA3OMEeaM2/ry37ZdNUsMbATBKSvezhWs77OkZXoWPEWXTvydWf1Nze/Ny9GJAeYIPI8WfTeAo7b7+JpIqQGDMaTK4qX8wLOjTUWJ+ztWAUrXdsHMvhIEKZOUoBBiK+QELrWAS/PKT+UPf/LHnJf4VQ1cGGA/uRjjvcQTdB/XQMzT2GsbuCIDWRX6JIm3+l9VD1Q3Ehv1+zXpjVK7eU9k8XB5iTbFldDLroUlbOcgl7e8BHWUiC2iig7k4Co3Ae4+ubALIlPKXoEaFmK16j9PI+Ajp root@mgmt-master01 Sudo: ALL=(ALL) NOPASSWD:ALL Machine Template: Infrastructure Ref: API Version: infrastructure.cluster.x-k8s.io/v1beta1 Kind: VSphereMachineTemplate Name: autonomy-elastic-dev-cluster Namespace: default Metadata: Replicas: 3 Rollout Strategy: Rolling Update: Max Surge: 1 Type: RollingUpdate Version: v1.21.11 Status: Conditions: Last Transition Time: 2022-11-24T09:31:31Z Message: Scaling up control plane to 3 replicas (actual 2) Reason: ScalingUp Severity: Warning Status: False Type: Ready Last Transition Time: 2022-11-24T09:31:04Z Status: True Type: Available Last Transition Time: 2022-11-24T09:29:43Z Status: True Type: CertificatesAvailable Last Transition Time: 2022-11-24T09:31:30Z Status: True Type: ControlPlaneComponentsHealthy Last Transition Time: 2022-11-24T12:48:26Z Message: etcd member autonomy-elastic-dev-cluster-bmwb9 does not have a corresponding machine Reason: EtcdClusterUnhealthy Severity: Error Status: False Type: EtcdClusterHealthy Last Transition Time: 2022-11-24T09:32:21Z Status: True Type: MachinesReady Last Transition Time: 2022-11-24T09:31:31Z Message: Scaling up control plane to 3 replicas (actual 2) Reason: ScalingUp Severity: Warning Status: False Type: Resized Initialized: true Observed Generation: 1 Ready: true Ready Replicas: 2 Replicas: 2 Selector: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster,cluster.x-k8s.io/control-plane Unavailable Replicas: 0 Updated Replicas: 2 Version: v1.21.11 Events: Type Reason Age From Message
Warning ControlPlaneUnhealthy 2m24s (x4435 over 17h) kubeadm-control-plane-controller Waiting for control plane to pass preflight checks to continue reconciliation: [machine autonomy-elastic-dev-cluster-bmwb9 does not have APIServerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have ControllerManagerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have SchedulerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdMemberHealthy condition]
Struggle to solve it, I find some related issues paste here: https://github.com/kubernetes-sigs/cluster-api/issues/5477 https://github.com/kubernetes-sigs/cluster-api/issues/5509 https://github.com/vmware-tanzu/tanzu-framework/issues/954
I try to change the:
and I find ssh into my second controller plane node , see the cloud-init.logs: vi cloud-init-output.log
the first normal controller plane node cloud-init-output.log like this:
try to ignore preflight-errors: sudo kubeadm join xxxxx --token xxxx \ --discovery-token-ca-cert-hash xxxx \ --control-plane \ --ignore-preflight-errors=all
still:
try to rm manifests:
but It seem that the cloud-init is hanging, cloud-init can't execute.
where is cloud-init files that I think i need to changed it can solve this problem.
two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini
I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..
next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?
===》this multi workload cluster autoscaler that I have got them running.. So the remain problem is: how can I create the 3 controller plane nodes use CAPI and CAPV.. and additional problem is: If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..
Answering this question/comment, kubeadm
cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log
to see the set of steps that are run which would show that kubeadm removing this script.
how can I create the 3 controller plane nodes use CAPI and CAPV..
Setting the CONTROL_PLANE_MACHINE_COUNT
environment variable to 3
should be the only change that would be needed to get a cluster with 3 control plane nodes.
Have you been able to get a single node control plane workload cluster running yet?
If you are using clusterctl generate cluster
command to generate and apply the cluster YAML, then do this
clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
csi-vsphere-config
ConfigMap to include the insecure-flag = true
under the [VirtualCenter x.x.x.x]
heading to make sure insecure connections to vCenter via the CSI pods are enabled.gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
kubectl apply -f /tmp/abc.yaml
.how can I create the 3 controller plane nodes use CAPI and CAPV..
Setting the
CONTROL_PLANE_MACHINE_COUNT
environment variable to3
should be the only change that would be needed to get a cluster with 3 control plane nodes.Have you been able to get a single node control plane workload cluster running yet?
yes, I can setup a single controller plane workload cluster and it can run well and can have cluster autoscaler.
Answering this question/comment,
kubeadm
cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the/var/log/cloud-init-output.log
to see the set of steps that are run which would show that kubeadm removing this script.
I have read the cloud-init-output.log, the second controller plane logs is like, i just don't know how it become this:
If you are using
clusterctl generate cluster
command to generate and apply the cluster YAML, then do this
- Pipe the generated cluster YAML manifest to a file via command
clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
Edit the generated manifest and add
- Edit the
csi-vsphere-config
ConfigMap to include theinsecure-flag = true
under the[VirtualCenter x.x.x.x]
heading to make sure insecure connections to vCenter via the CSI pods are enabled.- For good measure, update the CPI image version to match the k8s version used to create the clusters
gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
- Apply the updated YAML via
kubectl apply -f /tmp/abc.yaml
.- All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.
I am not using clusterctl. I just use the template: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/1700#issuecomment-1326159473 this template is download from the the repo. and I just install the CAPI and CAPV in my existing cluster as management cluster. CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0 Have any suggestion that if I have any problem that I don't consider?
In fact my only problem is : 1、how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane. 2、If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..
CAPV: 1.2
Could you use the latest CAPV version, v1.5.0
how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.
The replica number for the KubeadmControlPlane
object needs to be set to 3
for a 3 node control plane.
If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..
Could you raise this question in the kubeadm
repo or Slack channel? They might have a way documented for this one. Essentially you'd need a custom repository in the internal network hosting the images and have the nodes be able to access this repo by updating the containerd settings via the /etc/containerd/config.toml
file. Here is a rough blog I found for that.
Could you use the latest CAPV version, v1.5.0
--->Is related to CAPV version? I think v1.2.0 is higher enough? The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane. --->I have modified it to 3, but i seems have some problem i metions above.
Were you able to resolve this issue? Is there anything else I can do to help?
/lifecycle frozen
/close Closing due to inactivity
@srm09: Closing this issue.
from this picture, is capv can't match cpai v1.2? @srm09