kubernetes-sigs / cluster-api-provider-vsphere

Apache License 2.0
373 stars 294 forks source link

About CAPV related to CAPI #1700

Closed andyzheung closed 1 year ago

andyzheung commented 2 years ago

from this picture, is capv can't match cpai v1.2? @srm09

image

andyzheung commented 2 years ago

other question is: image the ova of this can't setup. image

I follow this to: https://medium.com/@abhishek.amjeet/clusterapi-for-kubernetes-a-detailed-look-on-vmware-cluster-api-2ddd541bafa9

use CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0

andyzheung commented 2 years ago

I try this ova, that can setup the work cluster... image

but a new problem is that only one controller plane node is ready. image image

kubectl logs -n capi-system capi-controller-manager-fbd594dc6-frfj8 image

Is this any logs need to check to find out the problem?

andyzheung commented 2 years ago

I use the template in this repo: demo-template.zip CONTROL_PLANE_MACHINE_COUNT='3' Is there any other things to consider?

andyzheung commented 2 years ago

I got into the not normal node, see the keubelt: journalctl -xefu kubelet image Is there any problem about kubeadm at this environment.

andyzheung commented 2 years ago

I try to ssh to the controller plane node2, and kubeadm reset kubeadm join 10.250.71.221:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx --control-plane --certificate-key xxxxx

few minites later, I can see: image but still not have node name: image and I can't see the third controller node:

logs in capi-kubeadm-control-plane-system kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-75d5f9d99-5vdgc image

andyzheung commented 2 years ago

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility.. image

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right? image

andyzheung commented 2 years ago

I see this issue, I seem need to deploy CAPD? what is? I only deploy CAPI and CAPV. https://github.com/kubernetes-sigs/cluster-api/issues/4027

andyzheung commented 2 years ago

kubectl get kubeadmcontrolplanes image

kubectl describe kubeadmcontrolplanes Name: autonomy-elastic-dev-cluster Namespace: default Labels: cluster.x-k8s.io/cluster-name=autonomy-elastic-dev-cluster Annotations: API Version: controlplane.cluster.x-k8s.io/v1beta1 Kind: KubeadmControlPlane Metadata: Creation Timestamp: 2022-11-24T09:29:42Z Finalizers: kubeadm.controlplane.cluster.x-k8s.io Generation: 1 Managed Fields: API Version: controlplane.cluster.x-k8s.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:kubeadmConfigSpec: .: f:clusterConfiguration: .: f:apiServer: .: f:extraArgs: .: f:cloud-provider: f:controllerManager: .: f:extraArgs: .: f:cloud-provider: f:files: f:initConfiguration: .: f:nodeRegistration: .: f:criSocket: f:kubeletExtraArgs: .: f:cloud-provider: f:name: f:joinConfiguration: .: f:nodeRegistration: .: f:criSocket: f:kubeletExtraArgs: .: f:cloud-provider: f:name: f:preKubeadmCommands: f:users: f:machineTemplate: .: f:infrastructureRef: f:replicas: f:rolloutStrategy: .: f:rollingUpdate: .: f:maxSurge: f:type: f:version: Manager: kubectl-client-side-apply Operation: Update Time: 2022-11-24T09:29:42Z API Version: controlplane.cluster.x-k8s.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"kubeadm.controlplane.cluster.x-k8s.io": f:labels: .: f:cluster.x-k8s.io/cluster-name: f:ownerReferences: .: k:{"uid":"c90f68e8-9764-4c69-8ec3-e5771b2304d7"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:status: .: f:conditions: f:initialized: f:observedGeneration: f:ready: f:readyReplicas: f:replicas: f:selector: f:unavailableReplicas: f:updatedReplicas: f:version: Manager: manager Operation: Update Time: 2022-11-24T09:38:51Z Owner References: API Version: cluster.x-k8s.io/v1beta1 Block Owner Deletion: true Controller: true Kind: Cluster Name: autonomy-elastic-dev-cluster UID: c90f68e8-9764-4c69-8ec3-e5771b2304d7 Resource Version: 111433 UID: 42624926-090a-47ca-8e77-911e3f59c996 Spec: Kubeadm Config Spec: Cluster Configuration: API Server: Extra Args: Cloud - Provider: external Controller Manager: Extra Args: Cloud - Provider: external Dns: Etcd: Networking: Scheduler: Files: Content: apiVersion: v1 kind: Pod metadata: creationTimestamp: null name: kube-vip namespace: kube-system spec: containers:

andyzheung commented 2 years ago

Struggle to solve it, I find some related issues paste here: https://github.com/kubernetes-sigs/cluster-api/issues/5477 https://github.com/kubernetes-sigs/cluster-api/issues/5509 https://github.com/vmware-tanzu/tanzu-framework/issues/954

andyzheung commented 2 years ago

I try to change the:

and I find ssh into my second controller plane node , see the cloud-init.logs: vi cloud-init-output.log image

the first normal controller plane node cloud-init-output.log like this: image

try to ignore preflight-errors: sudo kubeadm join xxxxx --token xxxx \ --discovery-token-ca-cert-hash xxxx \ --control-plane \ --ignore-preflight-errors=all

still: image

try to rm manifests: image

but It seem that the cloud-init is hanging, cloud-init can't execute. image

where is cloud-init files that I think i need to changed it can solve this problem.

andyzheung commented 2 years ago

two boss form vmware can help me solve the problem above or give some idea! really thanks。 @srm09 @fabriziopandini

I just want to solve the cluster autoscaler on vsphere. and I have used single controller plane + autoscaler as following picuture, and running ok, and can get CA capibility.. image

next I want solve the controller plane 3 node HA above, and then manage more workload clusters like this, I don't know if this architecture is right? image

===》this multi workload cluster autoscaler that I have got them running.. So the remain problem is: how can I create the 3 controller plane nodes use CAPI and CAPV.. and additional problem is: If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 commented 2 years ago

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

srm09 commented 2 years ago

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

srm09 commented 2 years ago

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

  1. Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
  2. Edit the generated manifest and add
    1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
    2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
  3. Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.
  4. All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.
andyzheung commented 2 years ago

how can I create the 3 controller plane nodes use CAPI and CAPV..

Setting the CONTROL_PLANE_MACHINE_COUNT environment variable to 3 should be the only change that would be needed to get a cluster with 3 control plane nodes.

Have you been able to get a single node control plane workload cluster running yet?

yes, I can setup a single controller plane workload cluster and it can run well and can have cluster autoscaler.

andyzheung commented 2 years ago

Answering this question/comment, kubeadm cleans up the scripts after the init/join command fails which is what is being referred to in the logs. No problem with kubeadm in the environment. You can check the /var/log/cloud-init-output.log to see the set of steps that are run which would show that kubeadm removing this script.

I have read the cloud-init-output.log, the second controller plane logs is like, i just don't know how it become this: image

andyzheung commented 2 years ago

If you are using clusterctl generate cluster command to generate and apply the cluster YAML, then do this

  1. Pipe the generated cluster YAML manifest to a file via command clusterctl generate cluster abc --kubernetes-version 1.23.8 > /tmp/abc.yaml
  2. Edit the generated manifest and add

    1. Edit the csi-vsphere-config ConfigMap to include the insecure-flag = true under the [VirtualCenter x.x.x.x] heading to make sure insecure connections to vCenter via the CSI pods are enabled.
    2. For good measure, update the CPI image version to match the k8s version used to create the clusters gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.23.0 <<==== point to the minor version of Kubernetes being used
  3. Apply the updated YAML via kubectl apply -f /tmp/abc.yaml.
  4. All the machines should be created eventually, install the CNI to move the Nodes to the Ready state.

I am not using clusterctl. I just use the template: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/1700#issuecomment-1326159473 this template is download from the the repo. and I just install the CAPI and CAPV in my existing cluster as management cluster. CAPI: 1.2.0 CAPV: 1.2 management cluster: 1.21.0 Have any suggestion that if I have any problem that I don't consider?

In fact my only problem is : 1、how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane. 2、If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

srm09 commented 2 years ago

CAPV: 1.2

Could you use the latest CAPV version, v1.5.0

how can I create the 3 controller plane nodes use CAPI and CAPV.. I think all things are ready, maybe only a little things that I don't consider? because I can create a single controller plane.

The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane.

If I can use the this CA capibility in a internal network environment that can't access internet? cloud-init or image pull may need internet? and any others?if cloud-init need internet, how to solve it..

Could you raise this question in the kubeadm repo or Slack channel? They might have a way documented for this one. Essentially you'd need a custom repository in the internal network hosting the images and have the nodes be able to access this repo by updating the containerd settings via the /etc/containerd/config.toml file. Here is a rough blog I found for that.

andyzheung commented 2 years ago

Could you use the latest CAPV version, v1.5.0

--->Is related to CAPV version? I think v1.2.0 is higher enough? The replica number for the KubeadmControlPlane object needs to be set to 3 for a 3 node control plane. --->I have modified it to 3, but i seems have some problem i metions above.

srm09 commented 1 year ago

Were you able to resolve this issue? Is there anything else I can do to help?

srm09 commented 1 year ago

/lifecycle frozen

srm09 commented 1 year ago

/close Closing due to inactivity

k8s-ci-robot commented 1 year ago

@srm09: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/1700#issuecomment-1472635434): >/close >Closing due to inactivity Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.