kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack
https://cluster-api-openstack.sigs.k8s.io/
Apache License 2.0
275 stars 253 forks source link

tag without release breaks capi-operator #2060

Closed simonostendorf closed 1 month ago

simonostendorf commented 2 months ago

/kind bug

What steps did you take and what happened: 1) install cluster-api-operator (with helm) 2) create CoreProvider, ControlPlaneProvider and BootstrapProvider manifests 3) create InfrastructureProvider manifest (for capo version 0.9.0)

Error occured:

failed to create repo from provider url for provider \"openstack\": error creating the GitHub repository client: failed to get latest release: release not found for version v0.10.2, please retry later or set \"GOPROXY=off\" to get the current stable release: 404 Not Found

What did you expect to happen: Installation of capo should work :)

Anything else you would like to add: I open the issue here and not in the repo of the capi operator, because I see the error in the not properly structured repo (release for tag missing) and not in the fact that the operator generates an error.

Environment:

bartekle commented 2 months ago

This also breaks initialization via clusterctl init --infrastructure openstack[:version]. Tried versions v0.9.0, v0.10.1 and omitting version but still the same error.

simonostendorf commented 2 months ago

This also breaks initialization via clusterctl init --infrastructure openstack[:version]. Tried versions v0.9.0, v0.10.1 and omitting version but still the same error.

To fix this, you could use your own FetchConfiguration (I think clusterctl has an equal machanism) with a custom GitHub URL to a forked repo that contains tags with releases (don't forget to upload the infrastructure-components.yaml to the release because capi needs this file)

bartekle commented 2 months ago

I think this also works with custom provider, and no need to fork:

providers:
  - name: "fix-openstack"
    url: "https://github.com/kubernetes-sigs/cluster-api-provider-openstack/releases/v0.9.0/infrastructure-components.yaml"
    type: "InfrastructureProvider"
mdbooth commented 1 month ago

This is a long-standing bug in clusterctl. I think it may have been fixed in kubernetes-sigs/cluster-api#10220, although I haven't confirmed this through testing. It looks like this was backported to CAPI 1.7.

To the best of my understanding this is a CAPI bug which can't be fixed in CAPO.

cwrau commented 1 month ago

Same thing is happening to us, https://github.com/kubernetes-sigs/cluster-api/pull/10220 doesn't fix this, I've been running from main to debug this

To the best of my understanding this is a CAPI bug

Are you sure? Maybe CAPI requires valid providers to always have releases, but I don't know the "interface" spec

which can't be fixed in CAPO.

Sure, you could add a corresponding GitHub release 😁

mdbooth commented 1 month ago

Sure, you could add a corresponding GitHub release 😁

Creating a tag and creating a GitHub release is not an atomic process. The very minimum amount of time between them is about 30 minutes, which means while it can be minimised, it is unavoidable without fixing the bug in clusterctl. In practise it can be anything up to a day given that it requires another reviewer.

The release process is documented here: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/RELEASE.md

The tl;dr is that we can't publish a release until the corresponding release image is both built and published. Building the release image requires pushing a tag. Publishing the release image requires pushing a change to kubernetes/k8s.io, which must also be reviewed.

There will always be a period between pushing a tag and pushing a release. If you're aware of any other provider who has discovered a workaround I'd be interested to know about it.

mdbooth commented 1 month ago

Some more fun context if you're in the mood: clusterctl uses the go proxy to fetch releases in a manner which isn't correct: https://kubernetes.slack.com/archives/C8TSNPY4T/p1714748617207099

clusterctl's release version detection code could do with some love in general. I'm sure they'll be grateful!

simonostendorf commented 1 month ago

Creating a tag and creating a GitHub release is not an atomic process. The very minimum amount of time between them is about 30 minutes, which means while it can be minimised, it is unavoidable without fixing the bug in clusterctl.

There will always be a period between pushing a tag and pushing a release. If you're aware of any other provider who has discovered a workaround I'd be interested to know about it.

Yes, this should be fixed in clusterctl / capi-operator (if possible), but currently this breaks capo init for 5 days because a tag exists and the corresponding release doesn't and that is the context why I opened the issue, not because you should fix this problem forever (that should be done in clusterctl)...

mdbooth commented 1 month ago

Do you think there's any action we need to take on this currently?

simonklb commented 1 month ago

Looks like registry.k8s.io/capi-openstack/capi-openstack-controller:v0.10.2 is published, are there any more steps remaining before you can create a release?

simonklb commented 1 month ago

If I read the release instructions correctly it is safe to create a draft release while waiting for the promotion to finish, would that eliminate this issue?

simonklb commented 1 month ago

Can confirm that clusterctl init is fixed on v1.7.

mdbooth commented 1 month ago

Looks like registry.k8s.io/capi-openstack/capi-openstack-controller:v0.10.2 is published, are there any more steps remaining before you can create a release?

Ah, ha! Thanks, I had thought this was published. It is now.

cwrau commented 1 month ago

Sure, you could add a corresponding GitHub release 😁

Creating a tag and creating a GitHub release is not an atomic process. The very minimum amount of time between them is about 30 minutes, which means while it can be minimised, it is unavoidable without fixing the bug in clusterctl. In practise it can be anything up to a day given that it requires another reviewer.

The release process is documented here: main/RELEASE.md

The tl;dr is that we can't publish a release until the corresponding release image is both built and published. Building the release image requires pushing a tag. Publishing the release image requires pushing a change to kubernetes/k8s.io, which must also be reviewed.

There will always be a period between pushing a tag and pushing a release. If you're aware of any other provider who has discovered a workaround I'd be interested to know about it.

Ah, I didn't know it was this complicated, mh 🤔

mdbooth commented 1 month ago

Sounds like we got to the bottom of this.