kubernetes-sigs / cluster-api-provider-vsphere

Apache License 2.0
365 stars 292 forks source link

Remove dependency for API access of Clusters #924

Open MaxRink opened 4 years ago

MaxRink commented 4 years ago

/kind feature

Describe the solution you'd like Currently all clusters created by CAPV have the vsphere-cloud-controller-manager installed, even if i strip out all other CSI components. Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example.

From my understanding vsphere-cloud-controller-manager only does the following things in a non-CSI cluster:

All of which, even the providerID ( https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html#update-all-node-providerid-fields ), could be done outside the cluster, thus making Clusters possible that dont have access to the api of the cluster they are running on, thus improving security.


MaxRink commented 4 years ago

im currently using thsi script to manually trigger the correct data being present

export GOVC_USERNAME='user'
export GOVC_PASSWORD='pw'
export GOVC_URL='server'

# In my case I'm using a prefix for the VM's, so grep'ing is necessary.
# You can remove it if the folder you are using only contains the machines you need.
for vm in $(govc ls "/$DATACENTER/vm/$FOLDER" | grep $VM_PREFIX); do
  MACHINE_INFO=$(govc vm.info -json -dc=$DATACENTER -vm.ipath="/$vm" -e=true)
  # My VMs are created on vmware with upper case names, so I need to edit the names with awk
  VM_NAME=$(jq -r ' .VirtualMachines[] | .Name' <<< $MACHINE_INFO | awk '{print tolower($0)}')
  # UUIDs come in lowercase, upper case then
  VM_UUID=$( jq -r ' .VirtualMachines[] | .Config.Uuid' <<< $MACHINE_INFO | awk '{print toupper($0)}')
  echo "Patching $VM_NAME with UUID:$VM_UUID"
  # This is done using dry-run to avoid possible mistakes, remove when you are confident you got everything right.
  kubectl patch node $VM_NAME -p "{\"spec\":{\"providerID\":\"vsphere://$VM_UUID\"}}"
  kubectl taint nodes $VM_NAME node.cloudprovider.kubernetes.io/uninitialized-
ncdc commented 4 years ago

@yastij have you thought about moving the deployment of CCM/CPI/etc to a ClusterResourceSet, once they're available? That should solve this request (assuming one has a separate controller to set the provider ID and remove the taint).

yastij commented 4 years ago

@ncdc - that can be a solution. I'm also thinking about what it would take to run the CPI as part of the management cluster

CecileRobertMichon commented 4 years ago

We are considering moving the external cloud provider components to a ClusterResourceSet once available for CAPZ.

"Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example." - we are running into exactly this today but for now relying on the user to manually apply the external cloud provider yaml after the first control plane is up https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/docs/topics/external-cloud-provider.md.

MaxRink commented 4 years ago

Just verfified that running the CCM from the management cluster works

MaxRink commented 4 years ago

So, here is my yaml-file that i used to run the ccm from the mgmt cluster: external-ccm-san.yaml.txt One thing i havent gotten to work with the this PoC: grabbing the vsphere credentials from a secret insetad out of a configmap. That always gave me this errors. Directly in the config everything works fione tho.

W0623 17:57:12.514429       1 credentialmanager.go:85] Cannot get secret vsphere-cpi in namespace bremen. error: "secret \"vsphere-cpi\" not found"
E0623 17:57:12.514437       1 credentialmanager.go:54] updateCredentialsMapK8s failed. err=secret "vsphere-cpi" not found
W0623 17:57:12.514443       1 credentialmanager.go:60] secret "vsphere-cpi" not found in namespace "bremen"
E0623 17:57:12.514448       1 credentialmanager.go:75] credentials not found for server vcenter1.sce-dcn.net
randomvariable commented 4 years ago

Suspect it's not using local object reference to grab the secret and ended up in the wrong namespace. Possibly a bug for the vsphere provider.

MaxRink commented 4 years ago

Might be. Might also be that ive botched the rolebindings. The important part is, that putting that in the mgmt cluster works without additional changes. And we can still just use a secret for the config in general, not just for the passowrd and username to mitigate it somewhat.

fabriziopandini commented 4 years ago

@MaxRink I'm reporting here the TL;DR from the slack thread, please correct me if there is something incomplete or wrong.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

sathieu commented 2 years ago

My understanding of the current status:

For this last item:

I see the following way forward:

See also somewhat related issue in CSI: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1742

EDIT 2022-06-01: Added third way forward

sathieu commented 2 years ago

@srm09 @MaxRink @yastij WDYT about my proposed ways forward? I can propose a PR for solution 1 (solution 2 is harder for me, and solution 3 means a new repo probably).