kubernetes-sigs / cluster-api-provider-ibmcloud

Cluster API Provider for IBM Cloud
https://cluster-api-ibmcloud.sigs.k8s.io
Apache License 2.0
62 stars 82 forks source link

Cluster incorrectly shown as ready when both LoadBalancers are not ready #2026

Open hamzy opened 2 days ago

hamzy commented 2 days ago

/kind bug /area provider/ibmcloud

What steps did you take and what happened: Deploy a 4.18 cluster on a PowerVS zone where LoadBalancers are slow to create. We are called with InfraReady. We then create DNS records for the LBs. However, only the public LB exists. So the cluster fails to deploy.

What did you expect to happen: We should wait for all specified LBs to become ready.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Karthik-K-N commented 2 days ago

I think currently we check public loadbalancer and set hostname accordingly, May be we need to look for all the configured loadbalancers status before setting infra as ready. https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/blob/main/controllers/ibmpowervscluster_controller.go#L334-L346.

mkumatag commented 2 days ago

@Karthik-K-N can someone fix this behaviour asap?

Karthik-K-N commented 2 days ago

@Karthik-K-N can someone fix this behaviour asap?

sure.

Karthik-K-N commented 2 days ago

I think currently we check public loadbalancer and set hostname accordingly, May be we need to look for all the configured loadbalancers status before setting infra as ready. https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/blob/main/controllers/ibmpowervscluster_controller.go#L334-L346.

@dharaneeshvrd Just looking at the code we already check all the loadbalancer status here in ReconcileLoadbalancer https://github.com/Karthik-K-N/cluster-api-provider-ibmcloud/blob/8a563a0620f34ecb772a73bccb9ed7f1384c822f/cloud/scope/powervs_cluster.go#L1977-L1979,

when do you think this issue will occur.

Karthik-K-N commented 2 days ago

We did some investigation on the code and its working as expected

  1. For all the LB configured for the cluster, We requue if the LB status is not active, For reference code.
  2. We set infra.Ready only after LB are ready, For reference code.

@hamzy are you running with latest code from this repo? If you still face issues with latest code, Please help us with controller logs for further debugging. Thanks

Karthik-K-N commented 2 days ago

/triage needs-information

hamzy commented 1 day ago

We are running version sigs.k8s.io/cluster-api-provider-ibmcloud v0.9.0-alpha.0.0.20240913094112-c6bcd313bce0

https://github.com/openshift/installer/blob/master/cluster-api/providers/ibmcloud/go.mod#L7

hamzy commented 1 day ago

Version sigs.k8s.io/cluster-api-provider-ibmcloud v0.9.0-beta.0.0.20241017140904-8a563a0620f3 in https://github.com/openshift/installer/pull/9118 also fails.

Karthik-K-N commented 1 day ago

Could you please share the error and the code reference.

hamzy commented 1 day ago

This is a failing run of #9118 https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/9118/pull-ci-openshift-installer-master-e2e-powervs-capi-ovn/1849180644286926848/artifacts/e2e-powervs-capi-ovn/ipi-install-powervs-install/artifacts/clusterapi_output/

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/9118/pull-ci-openshift-installer-master-e2e-powervs-capi-ovn/1849180644286926848/artifacts/e2e-powervs-capi-ovn/ipi-install-powervs-install/artifacts/.openshift_install.log

Search for "InfraReady: hostname =" to see there is only one LB active at the time although the other one does become active eventually.

mkumatag commented 1 day ago

@Karthik-K-N @hamzy let us setup a call on Monday and sort out this issue.

Karthik-K-N commented 1 day ago

InfraReady: hostname

Thanks for the reference. I will check and update more here. Looking at IBMPowerVSCluster resource from here. Seems like something is wrong, The LB is create_pending but Loadbalancer is set ready in conditions.

hamzy commented 1 day ago

FYI I pass in two LBs here: https://github.com/openshift/installer/blob/master/pkg/asset/manifests/powervs/cluster.go#L135-L162

So why under status, does it only have one?

  loadbalancers:
    p-mad02-2-capi-master-qwb48-loadbalancer:
      id: r050-70cc8d92-60fa-4ed1-9c09-11f0dd4d3d6a
      state: create_pending
      hostname: 70cc8d92-eu-es.lb.appdomain.cloud
      controllercreated: true
hamzy commented 1 day ago

@Karthik-K-N @hamzy let us setup a call on Monday and sort out this issue.

Sure, we could but it seems like the discussion on this PR seems to be sufficient?