kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin
https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html
Apache License 2.0
296 stars 179 forks source link

No release images available (`gcr.io/cloud-provider-vsphere` appears to be deleted) #3053

Closed embik closed 1 month ago

embik commented 1 month ago

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened:

It appears that there are no release images available for this CSI driver anymore. gcr.io/cloud-provider-vsphere seems to be deleted, and with it the latest CSI driver image (3.3.1):

$ skopeo inspect docker://gcr.io/cloud-provider-vsphere/csi/release/driver:v3.3.1
FATA[0000] Error parsing image name "docker://gcr.io/cloud-provider-vsphere/csi/release/driver:v3.3.1": unable to retrieve auth token: invalid username/password: denied: Project cloud-provider-vsphere has been deleted.

We are seeing image pulls failing because of that.

What you expected to happen:

Release images to be available.

How to reproduce it (as minimally and precisely as possible):

Attempt to install from the latest release manifest, i.e. https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml.

Anything else we need to know?:

I've also asked about this on #provider-vsphere, but it seems that this could use visibility: https://kubernetes.slack.com/archives/C9PGCDKV5/p1726572676507989.

Environment:

scheeles commented 1 month ago

If you are searching for images we created for some of the versions a mirror. Please test it before you deploy to production

quay.io/kubermatic/mirror/cloud-provider-vsphere/ccm quay.io/kubermatic/mirror/cloud-provider-vsphere/csi/release/driver quay.io/kubermatic/mirror/cloud-provider-vsphere/csi/release/syncer

These images are self-built on our side, so they won’t match checksums for the official images

Philip-A-Fry commented 1 month ago

What a tremendous headache. I just used the staging images, but patching the deployment to do so was quite painful (for me, anyhow):

https://console.cloud.google.com/artifacts/docker/k8s-staging-images/us-central1/csi-vsphere

bork91 commented 1 month ago

A workaround for me was to remove imagePullPolicy: Always, and then K8S used the images already on the node.

This only works on systems where the images are already pulled.

0xdnL commented 1 month ago

Bump. Same issue after restarting a node yesterday, worked-around using imagePullPolicy: Always and pushing local image to private repo for further use, as we're still using an older version.

RnkeZ commented 1 month ago

We had an image on our nodes so we uploaded them to our private registry so that it can be used in new deployments. However, it would be nice to know where the new images will be released. I couldn't find anything vsphere CSI related in registry.k8s.io https://explore.ggcr.dev/?repo=registry.k8s.io

upodroid commented 1 month ago

https://console.cloud.google.com/artifacts/docker/k8s-staging-images/us-central1/csi-vsphere

Please don't do this, the images here live for a short period of time.

We are working on promoting the images and serving them from registry.k8s.io/csi-vsphere/*

juliohm1978 commented 1 month ago

Cheers and all the best for the community!

Hoping this gets sorted out soon. Quite unexpected to see the images disappear from a public repo.

To be safe, we managed to recover the images we had in cache on the k8s nodes and pushed them to a local cache registry in our infrastructure.

BenTheElder commented 1 month ago

Just clarifying: gcr.io/cloud-provider-vsphere was not/is-not a Kubernetes SIG K8s Infra provided host, I'm not sure who owned this, maybe VMware.

These images should be on registry.k8s.io shortly (it's in progress https://github.com/kubernetes/k8s.io/pull/7230) and that will be the best place to pull them.

Also as mentioned above the other GCRs / endpoints are not supported for direct consumption (such as gcr.io/k8s-staging.* , pkg.dev, etc), these are intermediate locations we use to avoid granting humans and subprojects direct access to our "production" hosting at registry.k8s.io, which is locked down by policies and automation. These staging locations are subject to change at any time and in fact we will be moving them around in the immediate future.

https://registry.k8s.io is the intended public host for Kubernetes organization hosted images in community / project-owned accounts. registry.k8s.io serves immutable images/tags and is intended for end user consumption.

Using registry.k8s.io allows us to provide a more sustainable and cost effective solution for serving the public internet versus if you directly pull from a particular cloud region that may not be near your cluster and may increase our egress costs while subjecting you to risk that we switch backend hosts. The current multi-vendor architecture is described in the docs at registry.k8s.io

Keep in mind though: registry.k8s.io is volunteer run, depends on vendor donations, and is therefore unable to make any strong guarantees (see https://registry.k8s.io#stability).

If you need guaranteed uptime you should always mirror to a location you control or use a vendor-supported certified conformant distribution that provides their own hosting.

We have a guide at https://github.com/kubernetes/registry.k8s.io/blob/main/docs/mirroring/README.md

Philip-A-Fry commented 1 month ago

A workaround for me was to remove imagePullPolicy: Always, and then K8S used the images already on the node.

This only works on systems where the images are already pulled.

Good point, thanks! I did this:

kubectl patch deployment vsphere-csi-controller -n vmware-system-csi --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value": "IfNotPresent"}]'

msschl commented 1 month ago

Please upload the images as soon as possible on https://registry.k8s.io/. We are broken in our cluster at the moment. I imagine there are many others that are broken as well due to the official vanilla manifest using the gcr.io image repository. See https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/release-3.3/manifests/vanilla/vsphere-csi-driver.yaml#L280

nip1904 commented 1 month ago

To workaround I exported the images from my test cluster and imported it to every worker in production. With imagePullPolicy: IfNotPresent I can continue upgrading my prod cluster Hope that helps if you don't have a local registry


ctr -n=k8s.io image export  xxx.tar  $imageList
ctr -n=k8s.io image import  xxx.tar
tmull360 commented 1 month ago

to every worker in production. With imagePullPolicy: IfNotPresent I can continue upgrading my prod cluster

@nip1904, how did you export the images? I haven't found a way to export them from our current Kubernetes nodes, but this would be helpful info for us to use as a work around. Thank you

upodroid commented 1 month ago

We have published the last 2 releases of images at the following locations:

registry.k8s.io/csi-vsphere/driver:v3.3.1
registry.k8s.io/csi-vsphere/driver:v3.3.0
registry.k8s.io/csi-vsphere/syncer:v3.3.0
registry.k8s.io/csi-vsphere/syncer:v3.3.1

Please update your manifests to use this.

sbueringer commented 1 month ago

@upodroid Thank you for helping with this!

webberb commented 1 month ago

There is also this that was hosted on gcr.io before:

gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.31.0

Does this need to also be on registry.k8s.io?

upodroid commented 1 month ago

gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.31.0

Please read https://github.com/kubernetes/cloud-provider-vsphere?tab=readme-ov-file#warning-kubernetes-image-registry-migration-for-cloud-provider-vsphere

Also, the image doesn't have the best name but I'll leave renaming that to the vsphere maintainers. registry.k8s.io/cloud-pv-vsphere/cloud-provider-vsphere:v1.31.0 vs registry.k8s.io/cloud-pv-vsphere/manager:v1.31.0

webberb commented 1 month ago

Thanks so much @upodroid

rizlas commented 1 month ago

You can also use rancher mirror. rancher/mirrored-cloud-provider-vsphere-csi-release-driver:vX.Y.Z