Closed LappleApple closed 3 weeks ago
One thing affecting the length of execution is the number of container registries used to serve the images promoted.
From 3 GCR registries to more than 10 AR repositories.
/assign @meganwolf0
To try and address this issue, some effort was made to grab a sampling of image promo jobs that were created for various image promotions (see file name + date for reference back to source). The idea was to see what share of the total time was spent in each step of the promotion.
File | get promotion edges | validate signatures | promote images | signing images | total time | # promos |
---|---|---|---|---|---|---|
gcp-filestore-csi-driver-01-23.txt | 9.6% | 1.1% | 24.2% | 64.8% | 0:04:57.240000 | 20 |
kueue-01-18.txt | 14.3% | 1.8% | 17.2% | 66.5% | 0:03:00.802000 | 20 |
cluster-api-azure-controller-01-18.txt | 13.7% | 1.3% | 23.0% | 61.6% | 0:03:44.085000 | 20 |
metrics-server-01-23.txt | 8.8% | 1.0% | 16.0% | 74.1% | 0:04:52.458000 | 20 |
ibm-powervs-block-csi-driver-01-26.txt | 11.4% | 1.1% | 18.6% | 68.4% | 0:04:09.189000 | 22 |
ingress-nginx-controller-01-23.txt | 10.7% | 1.3% | 33.0% | 54.8% | 0:07:04.906000 | 40 |
ingress-nginx-controller-01-26.txt | 13.1% | 1.3% | 34.6% | 50.8% | 0:06:35.702000 | 44 |
kubecross-01-19.txt | 10.7% | 1.4% | 60.5% | 27.3% | 0:15:03.641000 | 100 |
kube-cross-01-26.txt | 15.6% | 2.3% | 56.2% | 25.6% | 0:11:31.794000 | 110 |
go-runner-01-19.txt | 20.6% | 3.1% | 20.1% | 56.0% | 0:08:09.598000 | 120 |
debian-base-01-27.txt | 21.8% | 3.3% | 23.8% | 50.9% | 0:07:38.530000 | 132 |
provider-os-01-18.txt | 14.6% | 2.5% | 52.1% | 30.7% | 0:12:18.403000 | 140 |
kube-cross-01-10.txt | 5.0% | 3.7% | 75.5% | 15.9% | 0:31:11.250000 | 500 |
For more images, the actual promotion itself was the greatest share of total time, for fewer images the validating/signing/replicating image portion took up a larger portion of the job.
(Wondering if variability in the time it takes to do these promotions are in part driven by network considerations that may vary day to day? Would it make sense to schedule these for traffic optimization?)
You can see the break down of the pieces of the jobs, for a lot of images, it’s undoubtedly the “promote images” portion that takes the most time. To parallelize some of this work, you’d need to have different jobs making requests so that rate limiting could be circumvented.
(Does multiple machines bypass the rate limiting? Is it limited by user credentials, target registry, or solely from origin IP?)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Objective
Context and things to think about while working on this task