Open chrischdi opened 6 months ago
I did try to look through the code a bit:
sigs.k8s.io/release-sdk/sign
, to e.g. signAndReplicate
(here) , kpromo does not set the transport to add the rate-limiter, because release-sdk does not allow us to.
SignImageInternal
function:github.com/sigstore/cosign/v2/cmd/cosign/cli/sign.SignCmd(...)
SignCmd
would allow to pass through a Transport (and because of that a RateLimiter) via signOpts.Registry.RegistryClientOpts
Instead of adding rate-limiting, the other possibility would be take a look into release-sdk and/or cosign to improve the api calls made.
This is a known issue and we're planning a larger refactor of the promo-tools code base, see other issues in this repo for more information.
This is a known issue and we're planning a larger refactor of the promo-tools code base, see other issues in this repo for more information.
What is the recommended action when our image promotions are failing with this error? I'm wondering how our users will be affected.
What is the recommended action when our image promotions are failing with this error? I'm wondering how our users will be affected.
If promotion fails with error such as:
run `cip run`: promote images: signing images: replicating signatures: copying signature ...
It's generally safe to ignore it. If it fails with any other error, the job should be restarted. You can ping Release Managers in the #release-management
Slack channel to restart the job for you.
It shouldn't affect ability to consume images, but signatures might not work properly or at all if this error happens. Unfortunately, there's nothing much we can do at this point, but we hope we'll be able to kick off the promo-tools refactor efforts soon.
similar failures in the patch release and minor releases for CAPI today. one patch release failing at the signing stage: https://prow.k8s.io/log?job=post-k8sio-image-promo&id=1780295493562142720
time="18:09:05.150" level=fatal msg="run `cip run`: promote images: signing images: replicating signatures: copying signature us-west2-docker.pkg.dev/k8s-artifacts-prod/images/cluster-api/clusterctl:sha256-e35d576ae8922459d284077fed7b2a49447b4cb835c69312327c52d75dafa8a4.sig to southamerica-west1-docker.pkg.dev/k8s-artifacts-prod/images/cluster-api/clusterctl:sha256-e35d576ae8922459d284077fed7b2a49447b4cb835c69312327c52d75dafa8a4.sig: PUT https://southamerica-west1-docker.pkg.dev/v2/k8s-artifacts-prod/images/cluster-api/clusterctl/manifests/sha256-e35d576ae8922459d284077fed7b2a49447b4cb835c69312327c52d75dafa8a4.sig: TOOMANYREQUESTS: Quota exceeded for quota metric 'Requests per project per user' and limit 'Requests per project per user per minute per user' of service 'artifactregistry.googleapis.com' for consumer 'project_number:388270116193'. (and 1 more errors)" diff=4.378s
{"component":"entrypoint","error":"wrapped process failed: exit status
and the minor release job failing at filtering edges: https://prow.k8s.io/log?job=post-k8sio-image-promo&id=1780297426096099328
time="18:10:24.256" level=fatal msg="run `cip run`: promote images: filtering edges: filtering promotion edges: reading registries: getting tag list: GET https://us-central1-docker.pkg.dev/v2/token?scope=repository%3Ak8s-artifacts-prod%2Fimages%2Fcluster-api%2Fclusterctl%3Apull&service=: TOOMANYREQUESTS: Quota exceeded for quota metric 'Requests per project per user' and limit 'Requests per project per user per minute per user' of service 'artifactregistry.googleapis.com' for consumer 'project_number:388270116193'." diff=28ms
{"component":"entrypoint","error":"wrapped process failed: exit status
The first failure can be ignored, the second job should be restarted. Can you please send a link to the job so that we can restart it?
@xmudrii thanks
sorry its this one: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-k8sio-image-promo/1780297426096099328
@cahillsf Restarted the job and now it's green https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-k8sio-image-promo/1780300931636662272
thanks for your help @xmudrii !
possibly related: https://github.com/kubernetes-sigs/promo-tools/issues/842
hit this with v1.30 release https://github.com/kubernetes/kubernetes/issues/126170
also the initial promo job didn't report failure, I think? but we didn't have all regions synced
time="19:28:06.925" level=info msg="Registry: gcr.io/k8s-staging-scheduler-plugins Image: controller Got: gcr.io/k8s-staging-scheduler-plugins/controller" diff=141ms time="19:28:07.077" level=fatal msg="run
cip run
: promote images: filtering edges: filtering promotion edges: reading registries: getting tag list: GET https://us-west1-docker.pkg.dev/v2/token?scope=repository%3Ak8s-artifacts-prod%2Fimages%2Fsig-storage%2Fsnapshot-controller%3Apull&service=: TOOMANYREQUESTS: Quota exceeded for quota metric 'Requests per project per region' and limit 'Requests per project per region per minute per region' of service 'artifactregistry.googleapis.com' for consumer 'project_number:388270116193'." diff=152ms
I'm guessing there is a gap in using the rate-limit aware client.
What happened:
unexpected status code 429 Too Many Requests
See https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-k8sio-image-promo/1776261613632884736
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
This issue did already occur in the past and was reported wrongly at
Ben pointed that:
So there may be potential to optimise promo-tools to not require that much API calls and to not exceed the limit.
Environment:
See the prowjob :-)
cat /etc/os-release
):uname -a
):