Closed neolit123 closed 1 year ago
/cc
PR for this in 1.25: https://github.com/kubernetes/kubernetes/pull/109938
https://github.com/kubernetes/kubernetes/pull/109938 broke out e2e tests for 'latest' https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm
I think what's happening is the following. A cluster is created with the older version of kubeadm and the clusterconfiguration.imageRepository is defaulted to "k8s gcr.io". The when upgrade happens the new kubeadm binary thinks that "k8s.gcr.io" is a custom repository. Kinder is not pre-pulling the right images.
For kubeadm we need to mutate the image repo field to registry.k8s.io.
For kinder we need to ensure it uses the ouput of "kubeadm config images" for prepull (might be the case already).
first thing we have to do is here: https://github.com/kubernetes/kubernetes/pull/110343
once it merges i can try debugging kinder workflows. EDIT: but looking at the logs we might be OK without kinder changes since it prepares registry.k8s.io images for the upgrade step, so as long as the clusterconfiguration.imagerepository has registry.k8s.io it should be good.
https://github.com/kubernetes/kubernetes/pull/110343 merged and i was expecting it to be a sufficient fix, but it seems like https://storage.googleapis.com/k8s-release-dev/ci/latest.txt is not pointing to the latest CI build, thus the jobs are continuing to fail.
notified #release-management and #release-ci-signal on slack: https://kubernetes.slack.com/archives/CN0K3TE2C/p1654258115320499 https://kubernetes.slack.com/archives/CJH2GBF7Y/p1654257907646369
time="11:29:00" level=info msg="Downloading https://storage.googleapis.com/k8s-release-dev/ci/v1.24.2-rc.0.10+a6b031e314ef2e/bin/linux/amd64/kube-apiserver.tar\n" time="11:33:13" level=info msg="fixing /private/var/folders/g7/ywncky4109zfc_6v5ww1fc2w0000gn/T/kinder-alter-image3586133515/bits/init/kube-apiserver.tar" fixed: k8s.gcr.io/kube-apiserver-amd64 -> k8s.gcr.io/kube-apiserver fixed: k8s.gcr.io/kube-apiserver-amd64 -> k8s.gcr.io/kube-apiserver
During kubeadm init
with latest kubeadm, it will try registry.k8s.io
as the image repo.
[root@paco ~]# ./kubeadm-v1.25 config images list --kubernetes-version=v1.24.2-rc.0.10+a6b031e314ef2e
registry.k8s.io/kube-apiserver:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-controller-manager:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-scheduler:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/kube-proxy:v1.24.2-rc.0.10_a6b031e314ef2e
registry.k8s.io/pause:3.7
registry.k8s.io/etcd:3.5.3-0
registry.k8s.io/coredns/coredns:v1.8.6
that's actually tricky to fix because the images are created from tars. given kubeadm was downloaded with the tars and is already on the node, we could execute "kubeadm config images" and check what is the default repository for the "kubeadm init" binary. based on that "fix" the repo to be "registry.k8s.io" and not "k8s.gcr.io" https://github.com/kubernetes/kubeadm/blob/5f13a39e3759c898ae94b56e828487740570dfac/kinder/pkg/build/alter/alter.go#L225
alternatively, we could just add a workaround for the ci-kubernetes-e2e-kubeadm-kinder-latest-on-1-24
job, such that we introduce a new "task" in the kinder workflow to execute on the nodes and re-tag / alias the images from gcr.k8s.io -> registry.k8s.io.
alternatively, we could just add a workaround for the
ci-kubernetes-e2e-kubeadm-kinder-latest-on-1-24
job, such that we introduce a new "task" in the kinder workflow to execute on the nodes and re-tag / alias the images from gcr.k8s.io -> registry.k8s.io.
re-tag seems to be the simplest way here. (Or a trick change in fixImageTar to tag it to both registries)
(Or a trick change in fixImageTar to tag it to both registries)
i'm testing a hack in fixImageTar right now. it's not great, but it can be removed once we no longer test the 1.24 k8s/1.25 kubeadm skew.
@pacoxu PR is here https://github.com/kubernetes/kubeadm/pull/2705
@dims @ameukam @BenTheElder
the old path for CI images used to be gcr.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.1.101_df0af6f7b8f9b7
but registry.k8s.io/k8s-staging-ci-images/kube-apiserver:v1.25.0-alpha.1.101_df0af6f7b8f9b7
does not work.
this failed redirect seems like a problem for all external consumers of CI images. what is the new valid path?
@neolit123 the ci images didn't move. since it is gcr.io
(NOT k8s.gcr.io
). please use the old path for CI images.
ok, in that case what is in kubeadm right now is valid for the CI path:
lubo@lubo-it:~/go/src/k8s.io/kubernetes/cmd/kubeadm$ grep "gcr.io" * -rnI
app/phases/uploadconfig/uploadconfig.go:122:// to 'registry.k8s.io' in case it was the legacy default 'k8s.gcr.io'
app/phases/uploadconfig/uploadconfig.go:126: if cfg.ImageRepository != "k8s.gcr.io" {
app/phases/uploadconfig/uploadconfig.go:135: "ConfigMap to be 'registry.k8s.io' instead of the legacy default of 'k8s.gcr.io'")
app/constants/constants.go:338: DefaultCIImageRepository = "gcr.io/k8s-staging-ci-images"
app/util/staticpod/utils_test.go:627: - image: gcr.io/google_containers/etcd-amd64:3.1.11
app/cmd/upgrade/node.go:150: // to be 'registry.k8s.io', if it was 'k8s.gcr.io'. Don't mutate the in-cluster value by passing
app/cmd/upgrade/common.go:77: // to be 'registry.k8s.io', if it was 'k8s.gcr.io'.
app/cmd/upgrade/diff.go:124: // to be 'registry.k8s.io', if it was 'k8s.gcr.io'. Don't mutate the in-cluster value by passing
app/apis/kubeadm/v1beta2/defaults.go:43: // (previously this defaulted to k8s.gcr.io)
app/apis/kubeadm/v1beta2/types.go:100: // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`
app/apis/kubeadm/types.go:120: // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`
app/apis/kubeadm/v1beta3/defaults.go:45: // (previously this defaulted to k8s.gcr.io)
app/apis/kubeadm/v1beta3/types.go:126: // `gcr.io/k8s-staging-ci-images` will be used as a default for control plane components and for kube-proxy, while `registry.k8s.io`
i.e. DefaultCIImageRepository = "gcr.io/k8s-staging-ci-images"
@neolit123 is there anything that's need to be done here ? happy to provide some help
@ameukam in the OP we have a task to do some cleanups in 1.26. this can happen after k/k master opens post CF.
Does something need to be done for kubeadm < 1.25 too, or can they just rely on the image redirect ?
This also goes for the old tarballs, that were generated with the content from kubeadm config images
Like: kubeadm config images list | xargs docker save
They could need some retagging, after image loading, if so.
Does something need to be done for kubeadm < 1.25 too, or can they just rely on the image redirect ?
k8s.gcr.io will continue to exist as a source of truth, and is itself an alias to the actual registries (unadvertised).
It would be a breaking change for older releases to switch the primary registry alias, e.g. for the reasons you mentioned, so they will not be updated.
However, in the near future k8s.gcr.io may begin to redirect to registry.k8s.io, which will again contain all the same images, but will backport only the need to allow it through the firewall, the image names will not change.
There was just a notice sent about this to the dev mailinglist, but it needs wider circulation
the flip was done on oct 3rd
kubeadm ci has been green thus far, let's reopen this if we need to change something else.
after discussion with @BenTheElder we should actually backport this to >=1.23 releases (1.23, 1.24). >= 1.25 have the changes. this is a matter of allowing the k8s project to not run out of funds ($$$).
note: this is an exception as we only backport bugfixes, but in this case it has to be done.
looks like we need to backport these PRs: https://github.com/kubernetes/kubernetes/pull/109938 https://github.com/kubernetes/kubernetes/pull/110343
we could leave the backports without doing the cleanups that we did here: https://github.com/kubernetes/kubernetes/pull/112006
cc @SataQiu @pacoxu for comments
cc @fabriziopandini @sbueringer in case this affects CAPI.
Need to do some retagging effort in minikube, or it would break the old preloads and caches (that only have the old registry)
Alternatively re-generate all the preloads, but that could already be "too late" if they are being cached and have been downloaded
it would only break air-gapped installs (and china ?)
the others would just be able to fetch the "new" image
Only if using the new patch release for which the images don't exist yet. Also for older releases we still currently plan to publish the tags to k8s.gcr.io it just won't be the default. Upgrades should always be taken carefully and we'll certainly need a prominent release note
Yeah, on second thought this would just affect new (minor) releases of kubeadm
So all that is needed is to mirror the version selection logic, like today (>= 1.25.0-alpha.1)
It seems work is done in v1.26.
@pacoxu: Closing this issue.
I added the backports to minikube. For most users, it is not used anyway.
By default, it will fetch the preload from GCS instead of using the cache.
https://storage.googleapis.com/minikube-preloaded-volume-tarballs/
But with --preload=false
and without a mirror, then it will still use it.
the k8s project is moving away from k8s.gcr.io to registry.k8s.io.
https://groups.google.com/g/kubernetes-sig-testing/c/U7b_im9vRrM/m/7qywJeUTBQAJ xref https://github.com/kubernetes/k8s.io/issues/1834
1.25
1.26
[x] cleanup TODOs from 1.25 https://github.com/kubernetes/kubernetes/pull/112006
cherrypicks for 1.22, 1.23, 1.24: https://github.com/kubernetes/kubernetes/pull/113388 https://github.com/kubernetes/kubernetes/pull/113393 https://github.com/kubernetes/kubernetes/pull/113395 (minus cleanups)