kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
283 stars 253 forks source link

Cluster creation fails when OpenStack does not have volume support #1345

Closed carstenkoester closed 1 year ago

carstenkoester commented 1 year ago

/kind bug

What steps did you take and what happened:

Running CAPO against an OpenStack cluster that does not have ceph, cinder, swift or any other persistent volume support configured. (This is intentional -- this is an OpenStack deployment designed for ephemeral workloads only).

Been using a number of different CAPO provider versions in the past, up to v0.4.0, without issues.

Recently upgraded to v0.6.3, and we are no longer able to create workload clusters at all. CAPO logs show:

E0923 22:32:19.906429       1 controller.go:317] controller/openstackcluster "msg"="Reconciler error" "error"="failed to create volume service client: No suitable endpoint could be found in the service catalog." "name"="..." "namespace"="..." "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="OpenStackCluster"

It looks like https://github.com/kubernetes-sigs/cluster-api-provider-openstack/commit/f5b623047780c0dd0a3c509896ef6737e3fa8970 added OpenStack volume support, but in doing so, made the volume service a hard requirement -- and would now fail to bind to the compute service if there was no volume endpoint.

What did you expect to happen:

I expected to be able to deploy workload clusters using provider version v0.6.3 in the same way as I was able to deploy them with earlier versions :-)

It's clear that the features implemented in https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1030 require volume support. However, it'd be great if CAPO could continue to function without a volume OpenStack endpoint when volumes are not being used.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

jichenjc commented 1 year ago

agree it should be optional for volume less openstack to be running for CAPO /assign

jichenjc commented 1 year ago

I don't have such env do you want to try https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1347 by build a new image based on that code? Thanks @carstenkoester

anaperic commented 1 year ago

@jichenjc thank you.

Tried to build the capo image (and succeeded) and wanted really to trick the system and provide diff image to capo. I did override the infrastructure-openstack and it picked up the image. Unfortunatelly, capo manager is not sane as it expects the newer version of API, where we initialized capi mgmnt cluster just simply w/ the latest v5 API versions (v0.6.3 version of provider)

using override of the provider to give us changed image (built from your PR/fork & branch):

clusterctl init   --target-namespace blah  --core cluster-api:v1.2.1   --bootstrap kubeadm:v1.2.1   --control-plane kubeadm:v1.2.1   --infrastructure openstack:v0.6.3

capo comes up, but of course does not have v1alpha6 CRD... I will try to see how to test it differently / and try to get all v0.6.3 CRDs.

carstenkoester commented 1 year ago

Thank you @jichenjc! Yes, this seems to work fine. Disclaimer, we were not able to test that this still works on an OpenStack that does use volume support, but the "volume-less" path is confirmed to work again now. Thanks again!

jichenjc commented 1 year ago

Yes, this seems to work fine. Disclaimer, we were not able to test that this still works on an OpenStack that does use volume support, but the "volume-less" path is confirmed to work again now.

yes, we have CI to guarantee the w/ volume case so should be good , as you confirm we can tolerate no volume endpoint , I Think we can merge the PR :)

try to get all v0.6.3 CRDs.

if you go with latest CAPO, I think 0.6.x CRD is the first choice :)

PatrickLaabs commented 1 year ago

I got the same issue also. I did some changes on my Yoga DevStack environment, and after everything was up again I run 'clusterctl init --infrastructure openstack' again and got the error mentioned above.

Using 'clusterctl init --infrastructure openstack:v0.5.3' solved it for me..