vexxhost / magnum-cluster-api

Cluster API driver for OpenStack Magnum
Apache License 2.0
47 stars 22 forks source link

Cant get this to work on bobcat - multiple issues #406

Closed flyersa closed 4 months ago

flyersa commented 4 months ago

Hi all,

fresh Openstack Bobcat install with kolla. magnum-capi is already bundled in the magnum containers but in an old version, so i upgraded them to the latest (didnt help).

Nothing basically works right. I put the kubeconfig from the basecluster to all magnum containers which is fine. Basically followed this one linked from your docs page:

https://satishdotpatel.github.io/openstack-magnum-capi/

Issues:

`helm repo add capi-addons https://stackhpc.github.io/cluster-api-addon-provider

Use the latest version from the main branch

helm upgrade \ cluster-api-addon-provider \ capi-addons/cluster-api-addon-provider \ --install \ --version ">=0.1.0-dev.0.main.0,<0.1.0-dev.0.main.9999999999"`

then on cluster creation the loadbalancer, controlplane and workers will spawn, but they are stuck. Nodes not getting in READY state because of no CNI available. All nodes have the "uninitalized" noSchedule taint. Coredns will not even deploy (complains about this taints)

Removing the taints will at least get coredns deployed and the nodes put in READY state. But thats it. the clusters never get completed, no cni installed nothing and are stuck in create status. I even tried with different kubernetes cluster for the clusterctl one.

flatcar images do not work at all, they just go directly into a reboot loop having issues with the network. I tried v.1.27.4 and v.1.27.8 . The ubuntu22 ones at least do something.

Anyone has a idea? its stil pain that there is no real documentation or only outdated instructions. I also tried this on a different openstack setup (also on bobcat) to rule out something is wrong with my magnum setup. Same problems.

mnaser commented 4 months ago

@flyersa I think you are trying to use another Cluster API driver.

then on the try to spawn a cluster it will complain that it cant find helm release or CRDs from the stackhpc provider. So i applied them manually with:

This tells me you're using the StackHPC driver, not the one that is provided by us. If you install the driver provided by us, I suggest that you look into this:

https://vexxhost.github.io/magnum-cluster-api/user/getting-started/#creating

Thanks Mohammed

flyersa commented 4 months ago

I have no idea which one it is. according to the docs that you reference its yours that is also bundled. Im not even aware of other magnum capi integrations.

magnum-cluster-api 0.21.0

pretty much looks like yours. As i said i followed the docs that you linked yourself on your docs page. imho bobcat comes bundled with 0.15 and your driver. There is no documentation explaining directly anything related. Also it is spooling up your images. Why it references or wants stackHPC crds i dont know.

What would be the correct way to install your driver manual? As i said i followed both docs from your docu page already on bobcat and both shield exactly the results i reported above.

as it seems this is your driver. did anyone test this with bobcat out of the box? The docs are useless as important steps that seem to be necessary are not mentioned (as example image metadata has nothing todo with the driver). I would really look forward to use magnum with your driver, but at the current state of magnum and this driver and documentation its all, but not viable :(

The reason i installed the stackHPC CRDs is because magnum complains about this out of the box when launching a cluster without a clusterctrl kubernetes which doesnt have them. The error message itself references to stackhpc from magnum. Thats why i looked there. Even if the magnum_api and magnum_conductor containers have your capi driver installed.

I would be more then happy to showcase it to you :) I mean its replicable. as long the kubernetes cluster has no stackHPC crds it will ask for them in the error message. I cant even find any stackHPC related package on the containers. this is a plain kolla installation, no manual stuff done except copying the kubeconfig file and upgrading from 0.15 to 0.21

mnaser commented 4 months ago

That's pretty odd that it's asking for that. We are actually running CI jobs against master so it should be working fine with the latest releases.

I can carve out 30 minutes of my time to try and figure out what the issue you're running into and looking into it together, feel free to ping me on Slack.

flyersa commented 4 months ago

Hi Mnaser,

well im sorry but it was most likely just our fault (outside stil the thing with the kube_version on the images). The drivers were disabled because the kubeconfig parameter was missing in the kolla configuration. doah ! :)

Awesome work on this!