About CAPV related to CAPI

andyzheung commented 2 years ago

from this picture, is capv can't match cpai v1.2? @srm09

andyzheung commented 2 years ago

other question is: the ova of this can't setup.

I follow this to: https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9

use CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0

andyzheung commented 2 years ago

I try this ova, that can setup the work cluster...

but a new problem is that only one controller plane node is ready.

kubectl logs -n capi-system capi-controller-manager-fbd594dc6-frfj8

Is this any logs need to check to find out the problem?

andyzheung commented 2 years ago

I use the template in this repo: demo-template.zip CONTROL_PLANE_MACHINE_COUNT='3' Is there any other things to consider?

andyzheung commented 2 years ago

I got into the not normal node, see the keubelt: journalctl -xefu kubelet Is there any problem about kubeadm at this environment.

andyzheung commented 2 years ago

I try to ssh to the controller plane node2, and kubeadm reset kubeadm join 10.250.71.221:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx --control-plane --certificate-key xxxxx

few minites later, I can see: but still not have node name: and I can't see the third controller node:

logs in capi-kubeadm-control-plane-system kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc

andyzheung commented 2 years ago

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?

andyzheung commented 2 years ago

I see this issue, I seem need to deploy CAPD? what is? I only deploy CAPI and CAPV. https://github.com/kubernetes-sigs/cluster-api/issues/4027

andyzheung commented 2 years ago

kubectl get kubeadmcontrolplanes

kubectl describe kubeadmcontrolplanes Name: autonomy-elastic-dev-cluster Namespace: default Labels: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster Annotations: API Version: controlplane.cluster.x-k8s.io/v1beta1 Kind: KubeadmControlPlane Metadata: Creation Timestamp: 2022-11-24T09:29:42Z Finalizers: kubeadm.controlplane.cluster.x-k8s.io Generation: 1 Managed Fields: API Version: controlplane.cluster.x-k8s.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:kubeadmConfigSpec: .: f:clusterConfiguration: .: f:apiServer: .: f:extraArgs: .: f:cloud-provider: f:controllerManager: .: f:extraArgs: .: f:cloud-provider: f:files: f:initConfiguration: .: f:nodeRegistration: .: f:criSocket: f:kubeletExtraArgs: .: f:cloud-provider: f:name: f:joinConfiguration: .: f:nodeRegistration: .: f:criSocket: f:kubeletExtraArgs: .: f:cloud-provider: f:name: f:preKubeadmCommands: f:users: f:machineTemplate: .: f:infrastructureRef: f:replicas: f:rolloutStrategy: .: f:rollingUpdate: .: f:maxSurge: f:type: f:version: Manager: kubectl-client-side-apply Operation: Update Time: 2022-11-24T09:29:42Z API Version: controlplane.cluster.x-k8s.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"kubeadm.controlplane.cluster.x-k8s.io": f:labels: .: f:cluster.x-k8s.io/cluster-name: f:ownerReferences: .: k:{"uid":"c90f68e8-9764-4c69-8ec3-e5771b2304d7"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:status: .: f:conditions: f:initialized: f:observedGeneration: f:ready: f:readyReplicas: f:replicas: f:selector: f:unavailableReplicas: f:updatedReplicas: f:version: Manager: manager Operation: Update Time: 2022-11-24T09:38:51Z Owner References: API Version: cluster.x-k8s.io/v1beta1 Block Owner Deletion: true Controller: true Kind: Cluster Name: autonomy-elastic-dev-cluster UID: c90f68e8-9764-4c69-8ec3-e5771b2304d7 Resource Version: 111433 UID: 42624926-090a-47ca-8e77-911e3f59c996 Spec: Kubeadm Config Spec: Cluster Configuration: API Server: Extra Args: Cloud - Provider: external Controller Manager: Extra Args: Cloud - Provider: external Dns: Etcd: Networking: Scheduler: Files: Content: apiVersion: v1 kind: Pod metadata: creationTimestamp: null name: kube-vip namespace: kube-system spec: containers:

args:
- manager env:
- name: cp_enable value: "true"
- name: vip_interface value:
- name: address value: 10.250.71.221
- name: port value: "6443"
- name: vip_arp value: "true"
- name: vip_leaderelection value: "true"
- name: vip_leaseduration value: "15"
- name: vip_renewdeadline value: "10"
- name: vip_retryperiod value: "2" image: ghcr.io/kube-vip/kube-vip:v0.5.5 imagePullPolicy: IfNotPresent name: kube-vip resources: {} securityContext: capabilities: add:
  - NET_ADMIN
  - NET_RAW volumeMounts:
- mountPath: /etc/kubernetes/admin.conf name: kubeconfig hostAliases:
hostnames:
- kubernetes ip: 127.0.0.1 hostNetwork: true volumes:
hostPath: path: /etc/kubernetes/admin.conf type: FileOrCreate name: kubeconfig status: {}

Owner: root:root Path: /etc/kubernetes/manifests/kube-vip.yaml Format: cloud-config Init Configuration: Local API Endpoint: Node Registration: Cri Socket: /var/run/containerd/containerd.sock Kubelet Extra Args: Cloud - Provider: external Name: {{ ds.meta_data.hostname }} Join Configuration: Discovery: Node Registration: Cri Socket: /var/run/containerd/containerd.sock Kubelet Extra Args: Cloud - Provider: external Name: {{ ds.meta_data.hostname }} Pre Kubeadm Commands: hostname "{{ ds.meta_data.hostname }}" echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts echo "127.0.0.1 localhost" >>/etc/hosts echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts echo "{{ ds.meta_data.hostname }}" >/etc/hostname Users: Name: capv Ssh Authorized Keys: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDjXZum6TwE2qL5wWgp38YA51C2fyfFHYQR7+jFrxq9QW1k3KKIPIc1wA8yMhbA3OMEeaM2/ry37ZdNUsMbATBKSvezhWs77OkZXoWPEWXTvydWf1Nze/Ny9GJAeYIPI8WfTeAo7b7+JpIqQGDMaTK4qX8wLOjTUWJ+ztWAUrXdsHMvhIEKZOUoBBiK+QELrWAS/PKT+UPf/LHnJf4VQ1cGGA/uRjjvcQTdB/XQMzT2GsbuCIDWRX6JIm3+l9VD1Q3Ehv1+zXpjVK7eU9k8XB5iTbFldDLroUlbOcgl7e8BHWUiC2iig7k4Co3Ae4+ubALIlPKXoEaFmK16j9PI+Ajp root@mgmt-master01 Sudo: ALL=(ALL) NOPASSWD:ALL Machine Template: Infrastructure Ref: API Version: infrastructure.cluster.x-k8s.io/v1beta1 Kind: VSphereMachineTemplate Name: autonomy-elastic-dev-cluster Namespace: default Metadata: Replicas: 3 Rollout Strategy: Rolling Update: Max Surge: 1 Type: RollingUpdate Version: v1.21.11 Status: Conditions: Last Transition Time: 2022-11-24T09:31:31Z Message: Scaling up control plane to 3 replicas (actual 2) Reason: ScalingUp Severity: Warning Status: False Type: Ready Last Transition Time: 2022-11-24T09:31:04Z Status: True Type: Available Last Transition Time: 2022-11-24T09:29:43Z Status: True Type: CertificatesAvailable Last Transition Time: 2022-11-24T09:31:30Z Status: True Type: ControlPlaneComponentsHealthy Last Transition Time: 2022-11-24T12:48:26Z Message: etcd member autonomy-elastic-dev-cluster-bmwb9 does not have a corresponding machine Reason: EtcdClusterUnhealthy Severity: Error Status: False Type: EtcdClusterHealthy Last Transition Time: 2022-11-24T09:32:21Z Status: True Type: MachinesReady Last Transition Time: 2022-11-24T09:31:31Z Message: Scaling up control plane to 3 replicas (actual 2) Reason: ScalingUp Severity: Warning Status: False Type: Resized Initialized: true Observed Generation: 1 Ready: true Ready Replicas: 2 Replicas: 2 Selector: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster,cluster.x-k8s.io/control-plane Unavailable Replicas: 0 Updated Replicas: 2 Version: v1.21.11 Events: Type Reason Age From Message

Warning ControlPlaneUnhealthy 2m24s (x4435 over 17h) kubeadm-control-plane-controller Waiting for control plane to pass preflight checks to continue reconciliation: [machine autonomy-elastic-dev-cluster-bmwb9 does not have APIServerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have ControllerManagerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have SchedulerPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdPodHealthy condition, machine autonomy-elastic-dev-cluster-bmwb9 does not have EtcdMemberHealthy condition]

andyzheung commented 2 years ago

Struggle to solve it, I find some related issues paste here: https://github.com/kubernetes-sigs/cluster-api/issues/5477 https://github.com/kubernetes-sigs/cluster-api/issues/5509 https://github.com/vmware-tanzu/tanzu-framework/issues/954

andyzheung commented 2 years ago

I try to change the:

--bootstrap-token-ttl=90m then: kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc

and I find ssh into my second controller plane node , see the cloud-init.logs: vi cloud-init-output.log

the first normal controller plane node cloud-init-output.log like this:

try to ignore preflight-errors: sudo kubeadm join xxxxx --token xxxx \ --discovery-token-ca-cert-hash xxxx \ --control-plane \ --ignore-preflight-errors=all

still:

try to rm manifests:

but It seem that the cloud-init is hanging, cloud-init can't execute.

where is cloud-init files that I think i need to changed it can solve this problem.

andyzheung commented 2 years ago

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility..

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?

===》this multi workload cluster autoscaler that I have got them running.. So the remain problem is: how can I create the 3 controller plane nodes use CAPI and CAPV.. and additional problem is: If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 commented 2 years ago

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

srm09 commented 2 years ago

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

srm09 commented 2 years ago

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
Edit the generated manifest and add
1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.
All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

andyzheung commented 2 years ago

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

yes, I can setup a single controller plane workload cluster and it can run well and can have cluster autoscaler.

andyzheung commented 2 years ago

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

I have read the cloud-init-output.log, the second controller plane logs is like, i just don't know how it become this:

andyzheung commented 2 years ago

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml

Edit the generated manifest and add

Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.

For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used

Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.

All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

I am not using clusterctl. I just use the template: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/1700#issuecomment-1326159473 this template is download from the the repo. and I just install the CAPI and CAPV in my existing cluster as management cluster. CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0 Have any suggestion that if I have any problem that I don't consider?

In fact my only problem is : 1、how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane. 2、If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 commented 2 years ago

CAPV: 1.2

Could you use the latest CAPV version, v1.5.0

how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.

The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane.

If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

Could you raise this question in the kubeadm repo or Slack channel? They might have a way documented for this one. Essentially you'd need a custom repository in the internal network hosting the images and have the nodes be able to access this repo by updating the containerd settings via the /etc/containerd/config.toml file. Here is a rough blog I found for that.

andyzheung commented 2 years ago

Could you use the latest CAPV version, v1.5.0

--->Is related to CAPV version? I think v1.2.0 is higher enough? The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane. --->I have modified it to 3, but i seems have some problem i metions above.

srm09 commented 1 year ago

Were you able to resolve this issue? Is there anything else I can do to help?

srm09 commented 1 year ago

/lifecycle frozen

srm09 commented 1 year ago

/close Closing due to inactivity

k8s-ci-robot commented 1 year ago

@srm09: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/1700#issuecomment-1472635434): >/close >Closing due to inactivity Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api-provider-vsphere

About CAPV related to CAPI #1700

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right?

Have you been able to get a single node control plane workload cluster running yet?