kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

switch the default kubeadm image registry to registry.k8s.io #2671

Closed neolit123 closed 1 year ago

neolit123 commented 2 years ago

the k8s project is moving away from k8s.gcr.io to registry.k8s.io.

https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM/m/7qywJeUTBQAJ xref https://github.com/kubernetes/k8s.io/issues/1834

1.25

1.26

SataQiu commented 2 years ago

/cc

neolit123 commented 2 years ago

PR for this in 1.25: https://github.com/kubernetes/kubernetes/pull/109938

neolit123 commented 2 years ago

https://github.com/kubernetes/kubernetes/pull/109938 broke out e2e tests for 'latest' https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm

I think what's happening is the following. A cluster is created with the older version of kubeadm and the clusterconfiguration.imageRepository is defaulted to "k8s gcr.io". The when upgrade happens the new kubeadm binary thinks that "k8s.gcr.io" is a custom repository. Kinder is not pre-pulling the right images.

For kubeadm we need to mutate the image repo field to registry.k8s.io.

For kinder we need to ensure it uses the ouput of "kubeadm config images" for prepull (might be the case already).

neolit123 commented 2 years ago

first thing we have to do is here: https://github.com/kubernetes/kubernetes/pull/110343

once it merges i can try debugging kinder workflows. EDIT: but looking at the logs we might be OK without kinder changes since it prepares registry.k8s.io images for the upgrade step, so as long as the clusterconfiguration.imagerepository has registry.k8s.io it should be good.

neolit123 commented 2 years ago

https://github.com/kubernetes/kubernetes/pull/110343 merged and i was expecting it to be a sufficient fix, but it seems like https://storage.googleapis.com/k8s-release-dev/ci/latest.txt is not pointing to the latest CI build, thus the jobs are continuing to fail.

notified #release-management and #release-ci-signal on slack: https://kubernetes.slack.com/archives/CN0K3TE2C/p1654258115320499 https://kubernetes.slack.com/archives/CJH2GBF7Y/p1654257907646369

pacoxu commented 2 years ago

time="11:29:00" level=info msg="Downloading https://storage.googleapis.com/k8s-release-dev/ci/v1.24.2-rc.0.10+a6b031e314ef2e/bin/linux/amd64/kube-apiserver.tar\n" time="11:33:13" level=info msg="fixing /private/var/folders/g7/ywncky4109zfc_6v5ww1fc2w0000gn/T/kinder-alter-image3586133515/bits/init/kube-apiserver.tar" fixed: k8s.gcr.io/kube-apiserver-amd64 -> k8s.gcr.io/kube-apiserver fixed: k8s.gcr.io/kube-apiserver-amd64 -> k8s.gcr.io/kube-apiserver

During kubeadm init with latest kubeadm, it will try registry.k8s.io as the image repo.

[root@paco ~]# ./kubeadm-v1.25 config images list --kubernetes-version=v1.24.2-rc.0.10+a6b031e314ef2e
registry.k8s.io/kube-apiserver:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-controller-manager:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-scheduler:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-proxy:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/pause:3.7
registry.k8s.io/etcd:3.5.3-0
registry.k8s.io/coredns/coredns:v1.8.6
neolit123 commented 2 years ago

that's actually tricky to fix because the images are created from tars. given kubeadm was downloaded with the tars and is already on the node, we could execute "kubeadm config images" and check what is the default repository for the "kubeadm init" binary. based on that "fix" the repo to be "registry.k8s.io" and not "k8s.gcr.io" https://github.com/kubernetes/kubeadm/blob/5f13a39e3759c898ae94b56e828487740570dfac/kinder/pkg/build/alter/alter.go#L225

alternatively, we could just add a workaround for the ci-kubernetes-e2e-kubeadm-kinder-latest-on-1-24 job, such that we introduce a new "task" in the kinder workflow to execute on the nodes and re-tag / alias the images from gcr.k8s.io -> registry.k8s.io.

pacoxu commented 2 years ago

alternatively, we could just add a workaround for the ci-kubernetes-e2e-kubeadm-kinder-latest-on-1-24 job, such that we introduce a new "task" in the kinder workflow to execute on the nodes and re-tag / alias the images from gcr.k8s.io -> registry.k8s.io.

re-tag seems to be the simplest way here. (Or a trick change in fixImageTar to tag it to both registries)

neolit123 commented 2 years ago

(Or a trick change in fixImageTar to tag it to both registries)

i'm testing a hack in fixImageTar right now. it's not great, but it can be removed once we no longer test the 1.24 k8s/1.25 kubeadm skew.

neolit123 commented 2 years ago

@pacoxu PR is here https://github.com/kubernetes/kubeadm/pull/2705

neolit123 commented 2 years ago

https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-latest-on-1-24 fixed

neolit123 commented 2 years ago

@dims @ameukam @BenTheElder

the old path for CI images used to be gcr.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.1.101_df0af6f7b8f9b7 but registry.k8s.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.1.101_df0af6f7b8f9b7 does not work.

this failed redirect seems like a problem for all external consumers of CI images. what is the new valid path?

dims commented 2 years ago

@neolit123 the ci images didn't move. since it is gcr.io (NOT k8s.gcr.io). please use the old path for CI images.

neolit123 commented 2 years ago

ok, in that case what is in kubeadm right now is valid for the CI path:

lubo@lubo-it:~/go/src/k8s.io/kubernetes/cmd/kubeadm$ grep "gcr.io" * -rnI
app/phases/uploadconfig/uploadconfig.go:122:// to 'registry.k8s.io' in case it was the legacy default 'k8s.gcr.io'
app/phases/uploadconfig/uploadconfig.go:126:    if cfg.ImageRepository != "k8s.gcr.io" {
app/phases/uploadconfig/uploadconfig.go:135:        "ConfigMap to be 'registry.k8s.io' instead of the legacy default of 'k8s.gcr.io'")
app/constants/constants.go:338: DefaultCIImageRepository = "gcr.io/k8s-staging-ci-images"
app/util/staticpod/utils_test.go:627:  - image: gcr.io/google_containers/etcd-amd64:3.1.11
app/cmd/upgrade/node.go:150:    // to be 'registry.k8s.io', if it was 'k8s.gcr.io'. Don't mutate the in-cluster value by passing
app/cmd/upgrade/common.go:77:       // to be 'registry.k8s.io', if it was 'k8s.gcr.io'.
app/cmd/upgrade/diff.go:124:        // to be 'registry.k8s.io', if it was 'k8s.gcr.io'. Don't mutate the in-cluster value by passing
app/apis/kubeadm/v1beta2/defaults.go:43:    // (previously this defaulted to k8s.gcr.io)
app/apis/kubeadm/v1beta2/types.go:100:  // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`
app/apis/kubeadm/types.go:120:  // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`
app/apis/kubeadm/v1beta3/defaults.go:45:    // (previously this defaulted to k8s.gcr.io)
app/apis/kubeadm/v1beta3/types.go:126:  // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`

i.e. DefaultCIImageRepository = "gcr.io/k8s-staging-ci-images"

ameukam commented 2 years ago

@neolit123 is there anything that's need to be done here ? happy to provide some help

neolit123 commented 2 years ago

@ameukam in the OP we have a task to do some cleanups in 1.26. this can happen after k/k master opens post CF.

afbjorklund commented 2 years ago

Does something need to be done for kubeadm < 1.25 too, or can they just rely on the image redirect ?

This also goes for the old tarballs, that were generated with the content from kubeadm config images

Like: kubeadm config images list | xargs docker save

They could need some retagging, after image loading, if so.

BenTheElder commented 2 years ago

Does something need to be done for kubeadm < 1.25 too, or can they just rely on the image redirect ?

k8s.gcr.io will continue to exist as a source of truth, and is itself an alias to the actual registries (unadvertised).

It would be a breaking change for older releases to switch the primary registry alias, e.g. for the reasons you mentioned, so they will not be updated.

However, in the near future k8s.gcr.io may begin to redirect to registry.k8s.io, which will again contain all the same images, but will backport only the need to allow it through the firewall, the image names will not change.

There was just a notice sent about this to the dev mailinglist, but it needs wider circulation

neolit123 commented 2 years ago

the flip was done on oct 3rd

https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAOZRXm_h9CNpnwWe%3Dv07VdtbU60biUzED-2V94FsNmYjfVGQLw%40mail.gmail.com?utm_medium=email&utm_source=footer

kubeadm ci has been green thus far, let's reopen this if we need to change something else.

neolit123 commented 2 years ago

after discussion with @BenTheElder we should actually backport this to >=1.23 releases (1.23, 1.24). >= 1.25 have the changes. this is a matter of allowing the k8s project to not run out of funds ($$$).

note: this is an exception as we only backport bugfixes, but in this case it has to be done.

looks like we need to backport these PRs: https://github.com/kubernetes/kubernetes/pull/109938 https://github.com/kubernetes/kubernetes/pull/110343

we could leave the backports without doing the cleanups that we did here: https://github.com/kubernetes/kubernetes/pull/112006

cc @SataQiu @pacoxu for comments

neolit123 commented 2 years ago

cc @fabriziopandini @sbueringer in case this affects CAPI.

afbjorklund commented 2 years ago

Need to do some retagging effort in minikube, or it would break the old preloads and caches (that only have the old registry)

Alternatively re-generate all the preloads, but that could already be "too late" if they are being cached and have been downloaded

it would only break air-gapped installs (and china ?)

the others would just be able to fetch the "new" image

BenTheElder commented 2 years ago

Only if using the new patch release for which the images don't exist yet. Also for older releases we still currently plan to publish the tags to k8s.gcr.io it just won't be the default. Upgrades should always be taken carefully and we'll certainly need a prominent release note

afbjorklund commented 2 years ago

Yeah, on second thought this would just affect new (minor) releases of kubeadm

So all that is needed is to mirror the version selection logic, like today (>= 1.25.0-alpha.1)

neolit123 commented 2 years ago

cherrypicks for 1.22, 1.23, 1.24: https://github.com/kubernetes/kubernetes/pull/113388 https://github.com/kubernetes/kubernetes/pull/113393 https://github.com/kubernetes/kubernetes/pull/113395

pacoxu commented 1 year ago

It seems work is done in v1.26.

k8s-ci-robot commented 1 year ago

@pacoxu: Closing this issue.

In response to [this](https://github.com/kubernetes/kubeadm/issues/2671#issuecomment-1363627131): >It seems work is done in v1.26. >- reopen if there are still remaining tasks. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
afbjorklund commented 1 year ago

I added the backports to minikube. For most users, it is not used anyway.

By default, it will fetch the preload from GCS instead of using the cache.

https://storage.googleapis.com/minikube-preloaded-volume-tarballs/

But with --preload=false and without a mirror, then it will still use it.