loft-sh / cluster-api-provider-vcluster

Mozilla Public License 2.0
70 stars 22 forks source link

Failed to provision a vCluster using CAPI due to vc-<clustername> secret not found issue #43

Closed dragon119 closed 3 days ago

dragon119 commented 7 months ago

I'm using CAPI to provision vCluster on the host cluster on Azure then getting this error 'Secret "vc-vcluster1" not found '. 'vcluster1' is the name of the cluster I'm trying to provision. There's another host cluster where I can successfully provision vCluster via CAPI. In the successful provisioning process, the secret seems to be automatically created by CAPI provider. Do I need to setup something in the host cluster to create this secret?

NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/vcluster1 False Warning CheckFailed 3d21h Secret "vc-vcluster1" not found
└─ControlPlane - VCluster/vcluster1 False Warning CheckFailed 3d21h Secret "vc-vcluster1" not found

lknite commented 6 months ago

Random guess, are all the capi pods deployed? Maybe you need to add labels to the namespaces to enable podsecurity?

Not much to go on here. Watching the logs of all the capi related pods and the vcluster capi pod should yield some clues.

deniseschannon commented 2 months ago

@dragon119 It's been awhile since you've posted your issue and we made some changes and launched a new release. Can you try out v0.2.0-alpha.1 to see if you are still facing issues?

attilio-oliva commented 6 days ago

I have encountered the same issue with v0.2.0-alpha.2, Kubernetes v1.29.7, and with the CAPI pods up and healthy

johannesfrey commented 3 days ago

@attilio-oliva Would it possible to share the exact steps/commands you issued to create the virtual cluster with the provider as well as the vcluster.yaml? Did you explicitly set vCluster version by setting the CHART_VERSION environment variable? Furthermore, is your host cluster also running on Azure? For context, the secret with the vc- prefix is created by vcluster itself during start up and not by the CAPI provider.

attilio-oliva commented 3 days ago

Sure, sorry for having omitted the steps I used to reproduce the problem.

The environment is not on Azure, but on a local VM used just for testing with Ubuntu server 22.04 LTS. Furthermore, I am using this VM as a single node cluster (again just for testing).

I can consistently reproduce this issue by:

  1. initialize the cluster with kubeadm in the VM
  2. Untaint the single node to be able to schedule on it
  3. Install a CNI (Calico)
  4. Install a loadbalancer (MetalLB)
  5. Downloading the cluster api cli app (clusterctl)
  6. Download the vcluster cli app (vcluster v0.20.0-beta.12)
  7. Install the vcluster CAPI provider (clusterctl init --infrastructure vcluster:v0.2.0-alpha.2)
  8. Generate the cluster. I tried both with setting the chart version or not, but the problem persisted:
    
    export CLUSTER_NAME=my-vcluster
    export CLUSTER_NAMESPACE=team-x
    export KUBERNETES_VERSION=1.29.7
    export HELM_VALUES=""
    export CHART_VERSION=0.20.0-beta.12

kubectl create namespace ${CLUSTER_NAMESPACE}

clusterctl generate cluster ${CLUSTER_NAME} \ --infrastructure vcluster \ --kubernetes-version ${KUBERNETES_VERSION} \ --target-namespace ${CLUSTER_NAMESPACE} | kubectl apply -f -

attilio-oliva commented 3 days ago

If I try to see the generated secrets (kubectl get secrets -n team-x):

NAME                                TYPE                 DATA   AGE
sh.helm.release.v1.my-vcluster.v1   helm.sh/release.v1   1      6m36s
vc-config-my-vcluster               Opaque               1      6m36s

But the warning states: CheckFailed Secret "vc-my-vcluster" not found Is it a different secret or the generated secret name is incorrect?

johannesfrey commented 3 days ago

Thx for the details. Does this also happen when you use the vCluster CLI directly instead of using the CAPI provider? So essentially:

vcluster -n team-x create my-vcluster

Directly after the creation, do you see any error logs? E.g. by issuing:

kubectl -n team-x logs -f -l=app=vcluster
attilio-oliva commented 3 days ago

I found out there was a problem with the default Persistent Volume. After fixing it, I can create the cluster directly using vCluster CLI with no problem. Furthermore, after waiting a bit, the warning goes away and the cluster goes ready also for CAPI.

So seems like it is a temporary warning and not very useful in case you get an error during the vCluster scheduling, but there is no actual problem for the provider by itself. I suggest potential future readers to proceed with @johannesfrey answer and if required use kubectl describe for the pod created by vCluster to inspect the specific problem.

johannesfrey commented 3 days ago

Glad that it worked. Yeah, the provider is "just" a means of deploying virtual clusters. So if there is anything preventing the virtual cluster itself from starting up, one has to look for the reasons in the virtual cluster. But, as you said, as there is another component involved this might not be that obvious 🙂. Closing the issue for now. Feel free to reopen if there is anything left to clarify.