kubernetes / registry.k8s.io

This project is the repo for registry.k8s.io, the production OCI registry service for Kubernetes' container image artifacts
https://registry.k8s.io
Apache License 2.0
367 stars 65 forks source link

🚨 Sigstore Signature images do not match across different geo-locations 🚨 #187

Open BenTheElder opened 1 year ago

BenTheElder commented 1 year ago

Is there an existing issue for this?

What did you expect to happen?

Images should have identical digests no matter what region I pull from.

This does not appear to be the case for some of the sigstore images added by the image-promoter

Thread: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1679166550351119

This issue is to track, the underlying fix will happen in the backing registries and in the image promoter (https://github.com/kubernetes-sigs/promo-tools) if we actively have a bug still causing this.

To be clear this is not a bug in the registry application, however it will be visible to users of the registry, and more visible on registry.k8s.io than k8s.gcr.io (because k8s.gcr.io has much much broader backing regions: eu, us, asia).

We'll want to fix the underlying issues if any remain in promo-tools and then fixup the backing registry contents somehow.

Debugging Information

I have script that inspects some important high-bandwidth images. It's a bit slow, and currently it only checks at k8s.gcr.io / registry.k8s.io https://github.com/BenTheElder/registry.k8s.io/blob/check-images/hack/tools/check-images.sh

We'll need to check the backing stores. I noticed a difference between my laptop at home and SSH to a cloud workstation.

Anything else?

/sig release

Code of Conduct

BenTheElder commented 1 year ago

https://github.com/kubernetes-sigs/promo-tools/issues/784 to track resolving any bugs in the image promoter

BenTheElder commented 1 year ago

So far this seems to only be the sigstore images.

Given that clients will generally be fetching these with a tag that is computed based on the digest of the adjacent image that was signed, not the digest of the signature "images" themselves, this is probably unlikely to break anyone, but worth fixing regardless.

BenTheElder commented 1 year ago

This could cause a problem if a single image pull (many API calls) somehow gets routed to multiple instances of the registry.k8s.io backend in different regions, because the signature blobs available would not match.

We think this is very unlikely. Still something to fix.

BenTheElder commented 1 year ago

So ... I've computed an index of all images like host : partial_ref : digest.

A partial_ref in this case is like kube-proxy-s390x@sha256:8acf368ce46f46b7d02e93cb1bcfb3b43931c0fbb4ee13b3d4d94d058fa727f7 IE it's either a digest or a tag image ref with the $host prefix trimmed to save space.

This is ~600M of JSON. It took O(hours to obtain) given the rate limits on scanning our registries and the volume of images.

I've then filtered this back down to only tag refs and digest refs that have no tags pointing at them. Both types map to the digest they point to. Filtering this way reduces the data set but not the information, it just means we skip the image@digest => digest reference type when we also have a tag pointing to that digest anyhow.

The tradeoff is to diff you need to check both the ref and the digest between two hosts, but we want to know if tags are different anyhow.

I would share but even the filtered and processed version is 353M ...

EDIT: The filtered version is available in gzip compressed JSON here: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1679213042607229?thread_ts=1679166550.351119&cid=CJH2GBF7Y


Anyhow, by running some queries over this data I can see that none of the backing registries have the same amount of refs.

If I pick two regions and compute the ref diffs, what I see every time so far is a mix of dangling digest refs with no tag and :sha256-<hash>.sig sigstore tags.

Unfortunately there are a large amount of dangling digest refs in the region diffs, so we can't just say "well it's all sigstore tags" and call it a day. There are also too many of these to quickly fetch and check all of the manifests.

But just inspecting a random sample of dangling digest refs from the region pairs I inspected, so far 100% of the time crane manifest ${host}/${partial_ref} reveals a sigstore manifest.

I would guess that we pushed signature tags multiple times to images and these dangling refs are previous signature pushes.

ALL of the tag type references so far are :sha256-.*.sig sigstore tags.

BenTheElder commented 1 year ago

The tag type references in the diff also suggest that we have signed images that only have a published signature at all in some regions AFAICT, which is a bit worse than exact signature varying by region ...

E.G. for us-west1 vs us-west2 AR instances:

Missing: us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le:sha256-4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b.sig (signature tag)

Available: us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le@sha256:4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b (the image that should be signed)

us-west1-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le:sha256-4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b.sig (signature in the other region)

us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le@sha256:4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b (signed image in the other region)

You can verify that these are really missing / available by crane manifest $image for each of these.

This also applies to k8s.gcr.io with eu/us/asia backing registries. However it's far less visible there as users are far less likely to ever encounter different backing registries given the very broad geographic scopes.

BenTheElder commented 1 year ago

Quantifying scale of sigstore tags:

376 missing sigstore tags in australia-southeast1-docker.pkg.dev/k8s-artifacts-prod/images
388 missing sigstore tags in europe-north1-docker.pkg.dev/k8s-artifacts-prod/images
362 missing sigstore tags in europe-southwest1-docker.pkg.dev/k8s-artifacts-prod/images
376 missing sigstore tags in europe-west2-docker.pkg.dev/k8s-artifacts-prod/images
365 missing sigstore tags in europe-west8-docker.pkg.dev/k8s-artifacts-prod/images
421 missing sigstore tags in asia.gcr.io/k8s-artifacts-prod
439 missing sigstore tags in eu.gcr.io/k8s-artifacts-prod
365 missing sigstore tags in europe-west4-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in southamerica-west1-docker.pkg.dev/k8s-artifacts-prod/images
374 missing sigstore tags in us-central1-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in us-west1-docker.pkg.dev/k8s-artifacts-prod/images
363 missing sigstore tags in asia-northeast1-docker.pkg.dev/k8s-artifacts-prod/images
383 missing sigstore tags in asia-south1-docker.pkg.dev/k8s-artifacts-prod/images
356 missing sigstore tags in europe-west9-docker.pkg.dev/k8s-artifacts-prod/images
377 missing sigstore tags in us-east1-docker.pkg.dev/k8s-artifacts-prod/images
387 missing sigstore tags in us-east4-docker.pkg.dev/k8s-artifacts-prod/images
370 missing sigstore tags in us-south1-docker.pkg.dev/k8s-artifacts-prod/images
377 missing sigstore tags in asia-east1-docker.pkg.dev/k8s-artifacts-prod/images
366 missing sigstore tags in asia-northeast2-docker.pkg.dev/k8s-artifacts-prod/images
371 missing sigstore tags in europe-west1-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in us-east5-docker.pkg.dev/k8s-artifacts-prod/images
374 missing sigstore tags in us-west2-docker.pkg.dev/k8s-artifacts-prod/images
430 missing sigstore tags in us.gcr.io/k8s-artifacts-prod

Note that's going to include each manifest, so there's potentially one of these for each architecture within the same image.

The more interesting detail is we have some other image tags that are only in some backends:

asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3

100% of these are only missing from the k8s.gcr.io registries, so I think they were somehow manually cleaned up from k8s.gcr.io but not registry.k8s.io, if I had to guess. They all appear to be related to cluster-api-azure.

See below for how this happened https://github.com/kubernetes/registry.k8s.io/issues/187#issuecomment-1475388979

You can verify that these are in other backends like this sample:

crane manifest southamerica-west1-docker.pkg.dev/k8s-artifacts-prod/images/cluster-api-aure/cluster-api-azure-controller:v0.3.0

crane manifest us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0

Code in https://github.com/BenTheElder/registry.k8s.io/commit/2e32a2cacd10487baad00d935d1e6769f8919639 / https://github.com/BenTheElder/registry.k8s.io/tree/check-images, data file in slack linked above.

BenTheElder commented 1 year ago

The "cluster-api-aure" tags were partially synced before and led to https://github.com/kubernetes/k8s.io/pull/4368 which should be catching future mis-configuration leading to partial sync on the promoter config side of things.

ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1666053166103809?thread_ts=1666040894.622279&cid=CCK68P2Q2

We should make sure that test is actually running on the migrated registry.k8s.io/ folder, I see it was copied over but I'm not sure the scripts run it.

BenTheElder commented 1 year ago

https://github.com/kubernetes/k8s.io/pull/4988 will ensure we keep applying the regression test that should prevent mis-configuring subprojects to not promote to all regions. (IE the cluster-api-aure situation)

BenTheElder commented 1 year ago

Confirmed dangling digests that are not in all regions are 100% either sigstore manifests (containing "dev.cosignproject.cosign/signature" in the manifest) or cluster-api-aure images.

Scanned with https://github.com/BenTheElder/registry.k8s.io/commit/a10201c9ba9c9bc1c8539c124ded06172daa2a4d

BenTheElder commented 1 year ago

So recapping:

TLDR of backing registry skew after fully checking through all mismatching tags and digests in a snapshot from this weekend.

The following cases appear to exist:

  1. sigstore signature tags not available in all backends
  2. sigstore signature tags available may have different digests in backends
  3. mis-configured promotion for #cluster-api-azure under cluster-api-aure/ to only some backends

These are all known issues. 3) should not get worse via regression tests (https://github.com/kubernetes/k8s.io/pull/4988)

1 & 2 are being worked on and https://github.com/kubernetes-sigs/promo-tools/issues/784 is probably the best place to track that.

See also for 1&2: https://groups.google.com/g/kubernetes-announce/c/0_jVjhLvNuI

puerco commented 1 year ago

OK, regarding the diverging .sig images in the registries I think it is a by-product of the promoter getting rate limited. I did a recap of the findings in slack but leaving it here too to register it:


Looking at images before the promoter started breaking due to the rate limits, the .sig layers match. I found a mismatching tag in the images promoted as part of the (failed) v1.26.3 release.

For example, registry.k8s.io/kube-scheduler:v1.26.3 is fully signed and replicated, all match:

TAG=$(crane digest registry.k8s.io/kube-scheduler:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-scheduler:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-south1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-northeast1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-northeast2-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
(output of all mirrors matching trimmed)

There are some images which have missing signatures, but the ones that are there, all match, for example kube-controller-manager:

TAG=$(crane digest registry.k8s.io/kube-controller-manager:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-controller-manager:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev:  ERROR
asia-south1-docker.pkg.dev: sha256:ec54ca831d0135d7691fa3cc36cfb5deb5d73eadbb6736edcbb8eb63270f02c3
asia-northeast1-docker.pkg.dev:  ERROR
asia-northeast2-docker.pkg.dev:  ERROR
australia-southeast1-docker.pkg.dev:  ERROR
europe-north1-docker.pkg.dev:  ERROR
europe-southwest1-docker.pkg.dev: sha256:ec54ca831d0135d7691fa3cc36cfb5deb5d73eadbb6736edcbb8eb63270f02c3
europe-west1-docker.pkg.dev:  ERROR
(full output trimmed)

in all the images we promoted that day, the one that has a different digest is the kube-proxy copy in asia-south1-docker.pkg.dev:

TAG=$(crane digest registry.k8s.io/kube-proxy:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-proxy:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev: sha256:b55c42ada82c11e3d8d176deb6572b53f371f061e19d69baf0f14d6dbc7362ab
asia-south1-docker.pkg.dev: sha256:bb1e7fda66a3bfd41d2dd0b71c275ef24e2386af82102b6c52b2f20233d8b940
asia-northeast1-docker.pkg.dev: sha256:b55c42ada82c11e3d8d176deb6572b53f371f061e19d69baf0f14d6dbc7362ab

Here's what's going on:

When the promoter signs, it stamps the images with its own signer identity:

COSIGN_EXPERIMENTAL=1 cosign-1.13 verify us-east1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 | jq  '.[].optional.Subject' 

Verification for us-east1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - Any certificates were verified against the Fulcio roots.
"krel-trust@k8s-releng-prod.iam.gserviceaccount.com"

(note the SA ID in the last line: krel-trust@ )

The diverging digest has the the identity from the signature we add when the build process runs:

COSIGN_EXPERIMENTAL=1 cosign-1.13 verify asia-south1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 | jq  '.[].optional.Subject'  

Verification for asia-south1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - Any certificates were verified against the Fulcio roots.
"krel-staging@k8s-releng-prod.iam.gserviceaccount.com"

(note the identity here is krel-staging@ )

This signature is the only one that is different in the release, so we are not resigning. It is simply that when processing the signatures for this particular image, the promoter got rate limited and died in the middle.

BenTheElder commented 1 year ago

This signature is the only one that is different in the release, so we are not resigning. It is simply that when processing the signatures for this particular image, the promoter got rate limited and died in the middle.

Wait, we're pushing images to prod and then mutating them? Why?

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam commented 1 year ago

/remove-lifecycle stale

aliok commented 2 months ago

Please note that this issue is linked in LFX Mentorship 2024 term 2. A related issue is https://github.com/kubernetes/release/issues/2962

BenTheElder commented 2 months ago

Thanks! This issue is just for tracking / visibility to users of the registry, the necessary changes will be in repos like kubernetes/release where image publication is managed, if/when it is fixed we will replicate updates back here for visibility.

anshikavashistha commented 1 month ago

@aliok This project seems interesting to me. I really want to work on this project .Is there any prerequisite task that needs to be done ? Please share the link of community channel or any slack channel.

BenTheElder commented 1 month ago

Hi folks, please discuss possibly working on this in https://github.com/kubernetes/release/issues/2962 and let's reserve this issue for indicating to users of the registry when we have progress or more details on the situation.