kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
739 stars 825 forks source link

Migrate away from google.com gcp project k8s-testimages #1523

Open spiffxp opened 3 years ago

spiffxp commented 3 years ago

Part of umbrella issue to migrate away from google.com gcp projects: https://github.com/kubernetes/k8s.io/issues/1469

At least some of this is part of the umbrella to migrate kubernetes e2e test images/registries to community-owned infrastructure: https://github.com/kubernetes/k8s.io/issues/1458

We should migrate away from the google.com-owned gcr.io/k8s-testimages repository and instead use a community-owned repository.

k8s-testimages hosts a variety of images that are built from source in kubernetes/test-infra. They fall broadly into two classes:

/wg k8s-infra /sig release /sig testing /area release-eng


EDIT(spiffxp): Went through and exhaustively identified images that need to be migrated from the repo, or can be left behind.

Images that are used and need migration, have already migrated, or are unused and need source deleted:

Images that appear to be unused:

spiffxp commented 3 years ago

/milestone v1.21

spiffxp commented 3 years ago

I am open to suggestions on where we should move these images.

I was thinking instead of a straight rename, take the time to break up the two classes:

Or, if we're just doing a lift-and-shift... then k8s-staging-test-infra

justaugustus commented 3 years ago

@spiffxp -- Maybe a few different ownership levels to consider here:

In the shared access "tier" would be kubekins-e2e and maybe krte. https://github.com/kubernetes/test-infra/blob/master/images/kubekins-e2e/OWNERS

I have approver on kubekins-e2e and I think this IAM group covers that case: https://github.com/kubernetes/k8s.io/blob/35c51085f1096c01516021a8bcb34b9cfda656bd/groups/sig-release/groups.yaml#L20-L34

I think the closest existing staging project would be k8s-infra-staging-build-image and I'd be happy to have y'all have access to that one.

spiffxp commented 3 years ago

@spiffxp -- Maybe a few different ownership levels to consider here:

  • common test-infra images

  • restricted access test-infra images (this may not be a required category)

  • shared access

In the shared access "tier" would be kubekins-e2e and maybe krte.

https://github.com/kubernetes/test-infra/blob/master/images/kubekins-e2e/OWNERS

I have approver on kubekins-e2e and I think this IAM group covers that case:

https://github.com/kubernetes/k8s.io/blob/35c51085f1096c01516021a8bcb34b9cfda656bd/groups/sig-release/groups.yaml#L20-L34

I think the closest existing staging project would be k8s-infra-staging-build-image and I'd be happy to have y'all have access to that one.

I agree with putting kubekins-e2e somewhere shared / under releng purview.

I don't think it should be build-images though. Thinking toward applying policies for build provenance and security audit for the build chain of k8s releases. The kubekins-e2e image is an organically evolved mess, easier to keep it out of that repo than attempt to filter policy on certain image names.

I'm thinking of:

My preference for shared repo would be releng, you've started moving some kubernetes job images there already IIRC

But ci-images also sounds like a better name, so not a strong preference

Hold on migrating kubekins until after code freeze but move on everything else, and see what else makes sense to move to shared from there

WDYT?

spiffxp commented 3 years ago

Ping @BenTheElder and @kubernetes/release-managers for comment

spiffxp commented 3 years ago

Another bit of followup to consider, setup auto bumping of images used in jobs: https://github.com/kubernetes/test-infra/issues/21137

BenTheElder commented 3 years ago

I think we should continue to have CI images in a dedicated registry. They're not the same as, say, GCB images (e.g. docker in docker setup, bootstrap.py, you name it). I also think staging registries should map to a single git repo so it's easier to locate the git source for any given image. (And that's the pattern we have right now in k8s.io pretty consistently).

CI images should continue to be pushed by automation, which is working well. We don't need to and should not grant humans push access (far less auditable than automation pushing from image sources in public git).

Definitely do not move anything in the middle of code freeze, please. This is not a worthwhile diversion from reviewing code changes before freeze.

ameukam commented 3 years ago

Opened https://github.com/kubernetes/k8s.io/pull/1908. We can start with common test-infra images /milestone v1.22

ameukam commented 3 years ago

Followup of #1908, add a canary ProwJob that push a test-infra image. Kettle ?

spiffxp commented 3 years ago

/milestone v1.23 So, lift-and-shift to k8s-staging-test-infra.

I think it's perfectly acceptable to start pushing images to the staging project @ameukam setup and keep pushing to k8s-testimages for now. Maybe even start switching over some of the non-kubernetes/kubernetes jobs.

But let's wait until after v1.22 releases to change images on the high traffic release-blocking / merge-blocking jobs.

spiffxp commented 3 years ago

Going to take a stab at kubekins over the next few days:

spiffxp commented 3 years ago

/assign

spiffxp commented 3 years ago

Pushing some of the existing kubekins over just in case. This would be to allow us to stick to an existing tag of a kubekins image if there's a good reason for it, or roll back to it if we tried bumping and it broke something.

I abused a poor cloud-shell with a one-liner to repeatedly pull, re-tag, and push a few docker images from recent history.

$ gcloud container images list-tags gcr.io/k8s-staging-test-infra/kubekins-e2e
DIGEST        TAGS                                                TIMESTAMP
4247e67a0ee9  latest-go-canary,v20210902-e4567b8-go-canary        2021-09-02T14:43:25
cb6db51c35e8  latest-test-infra,v20210902-e4567b8-test-infra      2021-09-02T14:42:53
ac3b4db0777e  latest-1.22,v20210902-e4567b8-1.22                  2021-09-02T14:42:49
5592406af509  latest-1.21,v20210902-e4567b8-1.21                  2021-09-02T14:42:35
adb811a33231  latest-experimental,v20210902-e4567b8-experimental  2021-09-02T14:42:35
7dceb0d07d89  v20210902-e4567b8e9c-experimental                   2021-09-02T14:42:12
f72e2dd7edbd  latest-master,v20210902-e4567b8-master              2021-09-02T14:42:08
0fc1420b216c  latest-1.18,v20210902-e4567b8-1.18                  2021-09-02T14:41:59
20e8d20e1645  v20210902-e4567b8e9c-master                         2021-09-02T14:41:53
da87c2bb29ec  v20210902-e4567b8e9c-1.18                           2021-09-02T14:41:53
dd008e1e2e61  v20210902-e4567b8e9c-1.21                           2021-09-02T14:41:30
689d04389a2c  latest-1.20,v20210902-e4567b8-1.20                  2021-09-02T14:41:25
21aaf3fa4d33  v20210902-e4567b8e9c-go-canary                      2021-09-02T14:41:24
36bd2ef17427  latest-1.19,v20210902-e4567b8-1.19                  2021-09-02T14:41:20
7d3b2f85613d  v20210902-e4567b8e9c-test-infra                     2021-09-02T14:41:11
73e0afeee319  v20210902-e4567b8e9c-1.20                           2021-09-02T14:41:10
b5a32c166ed8  v20210902-e4567b8e9c-1.22                           2021-09-02T14:41:01
64df8530535c  v20210902-e4567b8e9c-1.19                           2021-09-02T14:39:59
f7207441b8f8  v20210825-f1955d1-go-canary                         2021-08-25T12:18:52
c8248cefed35  v20210825-f1955d1-1.22                              2021-08-25T12:18:46
f4aae9f8c117  v20210825-f1955d1-1.21                              2021-08-25T12:18:35
23d0682ad069  v20210825-f1955d1-experimental                      2021-08-25T12:18:29
bc1f7c80dafd  v20210825-f1955d1-master                            2021-08-25T12:18:15
b65931e93f94  v20210825-f1955d1-1.18                              2021-08-25T12:18:11
231288b32c8f  v20210825-f1955d1-test-infra                        2021-08-25T12:18:06
fc5ac61beaca  v20210825-f1955d1-1.19                              2021-08-25T12:17:31
bdf4bf41b608  v20210825-f1955d1-1.20                              2021-08-25T12:17:28
4e5eb5412b3a  v20210825-bb6d84a-1.21                              2021-08-25T12:04:55
b5cf6147bfad  v20210825-bb6d84a-go-canary                         2021-08-25T12:04:20
51f2abae4c31  v20210825-bb6d84a-master                            2021-08-25T12:03:48
7a6b0e0018a2  v20210825-bb6d84a-experimental                      2021-08-25T12:03:27
d9decfceae55  v20210825-bb6d84a-test-infra                        2021-08-25T12:03:25
f473ef19724f  v20210825-bb6d84a-1.18                              2021-08-25T12:03:18
e9d2c991a0c1  v20210825-bb6d84a-1.22                              2021-08-25T12:03:16
6c62704f04a0  v20210825-bb6d84a-1.19                              2021-08-25T12:02:51
52d53031e202  v20210825-bb6d84a-1.20                              2021-08-25T12:02:29

I surveyed what actually shows up in kubernetes/test-infra

$ ag kubekins-e2e: | sed -e 's|.*gcr.io|gcr.io|' | cut -d: -f2 | sort | uniq -c
   1         "//images/kubekins-e2e
   1     """Return full path to kubekins-e2e
   3 $_GIT_TAG-$_CONFIG
   1 $_GIT_TAG-$_CONFIG'
   1 %s' % tag
   3 blahblahblah-1.15",
   3 blahblahblah-master",
   1 latest-$_CONFIG
   1 latest-$_CONFIG'
   3 latest-experimental
   8 latest-master
   1 v20200428-06f6e3b-1.15
  18 v20200428-06f6e3b-master",
   2 v20200428-06f6e3b-master". Check your images. Full message
   1 v20210226-c001921-1.17
   1 v20210302-a6bf478-1.19
   1 v20210302-a6bf478-1.20
   1 v20210302-a6bf478-1.21
   1 v20210302-a6bf478-1.22
   1 v20210302-a6bf478-master
 104 v20210825-f1955d1-master
   1 v20210825-f1955d1-master"
  32 v20210902-e4567b8-1.18
  47 v20210902-e4567b8-1.19
  50 v20210902-e4567b8-1.20
 132 v20210902-e4567b8-1.21
  53 v20210902-e4567b8-1.22
  18 v20210902-e4567b8-experimental
   2 v20210902-e4567b8-go-canary
1398 v20210902-e4567b8-master
   3 v20210902-e4567b8-master"
   1 v20210902-e4567b8-test-infra
   2 v20210902-e4567b8e9c-go-canary
   1 v20210902-e4567b8e9c-master

Using my human eyes to filter out stuff that I already pushed, and stuff that looks like tests, leaves us with the following tags to look at:

spiffxp commented 3 years ago

Opened https://github.com/kubernetes/test-infra/pull/23543 to address the stale tags listed above in https://github.com/kubernetes/k8s.io/issues/1523#issuecomment-916504977

Everything is now using either latest or the same tag

$ ag kubekins-e2e: | sed -e 's|.*gcr.io|gcr.io|' | cut -d: -f2 | sort | uniq -c
   1         "//images/kubekins-e2e
   1     """Return full path to kubekins-e2e
   3 $_GIT_TAG-$_CONFIG
   1 $_GIT_TAG-$_CONFIG'
   1 %s' % tag
   3 blahblahblah-1.15",
   3 blahblahblah-master",
   1 latest-$_CONFIG
   1 latest-$_CONFIG'
   3 latest-experimental
   8 latest-master
  18 latest-master",
   2 latest-master". Check your images. Full message
  32 v20210902-e4567b8-1.18
  48 v20210902-e4567b8-1.19
  51 v20210902-e4567b8-1.20
 133 v20210902-e4567b8-1.21
  54 v20210902-e4567b8-1.22
  18 v20210902-e4567b8-experimental
   4 v20210902-e4567b8-go-canary
1506 v20210902-e4567b8-master
   4 v20210902-e4567b8-master"
   1 v20210902-e4567b8-test-infra
spiffxp commented 3 years ago

While waiting for more of the kubekins stuff to land, let's look at what other images we need to migrate

spiffxp@spiffxp-macbookpro:test-infra (master)$ ./experiment/print-job-image-summary.sh
# images used by prowjobs on prow.k8s.io
- total /                      2385 total, 56 unique
  - not gcr.io /               136 total, 13 unique
    - dockerhub /              132 total, 9 unique
    - quay.io /                4 total, 4 unique
  - gcr.io /                   2249 total, 43 unique
    - kubernetes.io gcp org
      - k8s-staging            55 total, 9 unique
      - k8s.gcr.io             14 total, 5 unique
    - google.com gcp org
      - k8s-prow               17 total, 7 unique
      - k8s-testimages         2141 total, 12 unique
        - kubekins-e2e         1844 total, 1 unique
        - image-builder        162 total, 1 unique
        - krte                 68 total, 1 unique
        - other                229 total, 10 unique
          - bazelbuild         5 total, 1 unique
          - benchmarkjunit     4 total, 1 unique
          - bigquery           2 total, 1 unique
          - bootstrap          9 total, 1 unique
          - e2e-kubemci        1 total, 1 unique
          - gcloud-bazel       22 total, 1 unique
          - gcloud-in-go       6 total, 1 unique
          - gubernator         1 total, 1 unique
          - image-builder      162 total, 1 unique
          - krte               68 total, 1 unique
          - bazel              5 total, 1 unique
    - other (unsure which org) 22 total, 10 unique
      - bazel                  21 total, 3 unique
      - ci                     2 total, 1 unique
      - mdlint                 3 total, 1 unique
      - shellcheck             3 total, 1 unique
      - minikube-e2e           1 total, 1 unique
      - prow-test              1 total, 1 unique
      - gcloud-bazel           23 total, 2 unique
      - bazel                  21 total, 3 unique
      - cip                    5 total, 1 unique
      - octodns                2 total, 1 unique

Snipping out from that...

spiffxp commented 3 years ago

I've started working migrating over the higher job count images. I'm holding off on the bazel images because I think some of them may be replaceable, and I'm holding off on the other low-count images because I am not as familiar with where they're used.

spiffxp commented 3 years ago

Migrating gcb-docker-gcloud is going to require PRs to a lot of repos' cloudbuild.yaml files

spiffxp commented 3 years ago

Two more job images

jimdaga commented 3 years ago

@spiffxp can you confirm I'm addressing what you were looking for via these PRs? I'll try to chug thru more batches as time allows if they look good.

Update: found that https://github.com/kubernetes-csi/csi-release-tools/pull/175 is the only needed PR for the CSI repos

spiffxp commented 3 years ago

@jimdaga yeah that looks right, you may end up having to sync release-tools to the other repos in followup PRs

spiffxp commented 3 years ago

Scripted creation of the remaining PRs for migration of the gcr.io/k8s-testimage/gcb-docker-gcloud image: https://gist.github.com/spiffxp/bc8af986ce8439d49b919865ed68af9f

spiffxp commented 3 years ago

Noticed a few more job configs popped up using the old google.com-owned kubekins-e2e image, PR to migrate: https://github.com/kubernetes/test-infra/pull/24504

spiffxp commented 3 years ago

Updated output from the reporting script (found a bug, opened https://github.com/kubernetes/test-infra/pull/24505)

# images used by prowjobs on prow.k8s.io
- total /                                             2554 total, 54 unique
  - gcr.io /                                          2411 total, 41 unique
    - kubernetes.io gcp org
      - k8s-staging-test-infra                        2270 total, 9 unique
        - kubekins-e2e                                2000 total, 1 unique
        - image-builder                               173 total, 1 unique
        - krte                                        73 total, 1 unique
        - gcloud-in-go                                6 total, 1 unique
        - bootstrap                                   6 total, 1 unique
        - bazelbuild                                  5 total, 1 unique
        - benchmarkjunit                              4 total, 1 unique
        - bigquery                                    2 total, 1 unique
        - triage                                      1 total, 1 unique
      - k8s-staging-the_rest                          59 total, 7 unique
      - k8s.gcr.io                                    17 total, 3 unique
    - google.com gcp org
      - k8s-prow                                      19 total, 7 unique
      - k8s-testimages                                30 total, 6 unique
        - gcloud-bazel                                12 total, 1 unique
        - kubekins-e2e                                5 total, 1 unique
        - ci_fuzz                                     3 total, 1 unique
        - gubernator                                  1 total, 1 unique
        - e2e-kubemci                                 1 total, 1 unique
        - bazel                                       0 total, 0 unique
    - other (unsure which org)                        16 total, 9 unique
      - cluster-api-provider-vsphere/extra/shellcheck 3 total, 1 unique
      - cluster-api-provider-vsphere/extra/mdlint     3 total, 1 unique
      - k8s-artifacts-prod/infra-tools/octodns        2 total, 1 unique
      - cloud-provider-vsphere/ci                     2 total, 1 unique
      - cloud-builders/bazel                          2 total, 1 unique
      - rules-k8s/gcloud-bazel                        1 total, 1 unique
      - k8s-minikube/prow-test                        1 total, 1 unique
      - k8s-minikube/minikube-e2e                     1 total, 1 unique
      - k8s-ingress-image-push/ingress-gce-glbc-amd64 1 total, 1 unique
  - not gcr.io /                                      143 total, 13 unique
    - dockerhub /                                     139 total, 9 unique
    - quay.io /                                       4 total, 4 unique
spiffxp commented 2 years ago

I copy-pasted the wrong tag for gcr.io/k8s-staging-test-infra/gcb-docker-gcloud in https://github.com/kubernetes/k8s.io/issues/1523#issuecomment-982951168. Updated in-flight PRs, and opened followup PRs for those that had already merged

For reference:

ameukam commented 2 years ago

/milestone v1.24

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 2 years ago

/remove-lifecycle stale /milestone v1.25

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 2 years ago

/remove-lifecycle stale

ameukam commented 2 years ago

We need to come back to this and do a migration to Artifact Registry. Need a dedicated issue /milestone clear

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

riaankleinhans commented 2 years ago

/remove-lifecycle stale Trust we will get to this in 2023

cpanato commented 2 years ago

i can help on that, @ameukam can you give to me a quick status on what is missing or pending?

ameukam commented 2 years ago

What's remaining is of all the projects:

@cpanato ensure all those repos using the community-owned registry

cpanato commented 1 year ago
k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

cpanato commented 1 year ago

/remove-lifecycle rotten

will continue this week

cpanato commented 1 year ago

all missing repo is done now, waiting approval cc @ameukam

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 10 months ago

/remove-lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

BenTheElder commented 6 months ago

/remove-lifecycle rotten

BenTheElder commented 5 months ago

All of these images will go away when GCR shuts down ... there's currently no owner / intention to migrate to AR and I'm not sure we should.

I think we should use this code search for the current list and burn it down until there are no results: https://cs.k8s.io/?q=gcr.io%2Fk8s-testimages&i=nope&files=&excludeFiles=&repos=

It looks like two images still publish to k8s-testimages but I'm not sure we're actually using them at all.

puerco commented 5 months ago

I suspect some of those are the image promoter's test images. I did a quick search and it still has references to gcr.io in the documentation and hardcoded everywhere :cry:

I can check what is still current after Cloud Native SecurityCon (in about a week).

BenTheElder commented 3 months ago

We need to finish this ASAP, GCR shutdown => google projects turning down earlier => these images are at risk.

I'm in a thread about if we can keep them but I have some OOO coming up and these remain a liability anyhow.

michelle192837 commented 3 months ago

List of images (or 'images') used in https://cs.k8s.io/?q=gcr.io%2Fk8s-testimages&i=nope&files=&excludeFiles=&repos=:

Image Uses Handled?
gcr.io/k8s-testimages/perf-tests-util/containerd Uses N
gcr.io/k8s-testimages/netperfbenchmark Uses N
gcr.io/k8s-testimages/probes Uses N
gcr.io/k8s-testimages/quay.io/prometheus-operator/prometheus-config-reloader Uses N
gcr.io/k8s-testimages/quay.io/prometheus-operator/prometheus-operator Uses N
gcr.io/k8s-testimages/quay.io/prometheus/node-exporter Uses N
gcr.io/k8s-testimages/grafana/grafana Uses N
gcr.io/k8s-testimages/quay.io/prometheus/prometheus Uses N
gcr.io/k8s-testimages/perf-tests-util/access-tokens Uses N
gcr.io/k8s-testimages/perf-tests-util/request-benchmark Uses N
gcr.io/k8s-testimages/kube-cross-amd64 Uses N
gcr.io/k8s-testimages/launcher.gcr.io/google/bazel Uses N
gcr.io/k8s-testimages/gubernator Uses N
gcr.io/k8s-testimages/boskos Uses Y (no-op)
gcr.io/k8s-testimages/kubekins-e2e-prow Uses N
gcr.io/k8s-testimages/logexporter Uses N
gcr.io/k8s-testimages/krte Uses N
gcr.io/k8s-testimages/admission Uses N
gcr.io/k8s-testimages/branchprotector Uses N
gcr.io/k8s-testimages/peribolos Uses N
gcr.io/k8s-testimages/pipeline Uses N
gcr.io/k8s-testimages/gcb-docker-gcloud Uses N
gcr.io/k8s-testimages/kubekins-e2e Uses N
gcr.io/k8s-testimages/image-builder Uses Y (no-op)
gcr.io/k8s-testimages/bootstrap Uses Y (no-op)

There's also some examples, tests, or docs mentions that aren't as relevant:

michelle192837 commented 3 months ago

For comparison, the actual list of images in k8s-testimages:

A lot of these were migrated previously or as part of the move off the default build cluster in Prow and shouldn't be in use anymore.

ameukam commented 3 months ago

/milestone v1.32