BobyMCbobs commented 3 years ago

Umbrella issue: k8s.gcr.io => registry.k8s.io solution #1834

This markdown is synced from https://hackmd.io/gN-1GeSpSgyNSvmjKSULbg?edit to https://github.com/kubernetes/k8s.io/issues/1834#issue-841372237 manually by @BobyMCBobs

Scope: https://github.com/kubernetes/k8s.io/wiki/New-Registry-url-for-Kubernetes-(registry.k8s.io)

Design Doc: https://docs.google.com/document/d/1yNQ7DaDE5LbDJf9ku82YtlKZK0tcg5Wpk9L72-x2S2k/edit (shared w/ dev@kubernetes.io and SIG mailing list)

Board: https://github.com/orgs/kubernetes/projects/77

DRAFT AIs that need filled turned into tickets: https://github.com/orgs/kubernetes/projects/77/views/2?filterQuery=is%3Adraft

What exactly are you doing? (and how?)

[x] We are setting up an AWS account with an IAM role and s3 buckets in AWS regions where we see a large percentage of source image pull traffic
- [x] IAM Role + User for write - k/k8s.io#3568
- [x] List of Regions with 80% traffic - oci-proxy#39
- [x] Determine which regions should serve the image layers - oci-proxy/issues#38
  - kubernetes/k8s.io#3568
  - kubernetes/k8s.io#3620
  - kubernetes-sigs/promo-tools#533
[x] Create Buckets per Region Names
- [x] Add Terraform for registry-k8s-io s3 buckets k/k8s.io#PR3605
- [x] Create s3 Buckets in AWS Regions k/k8s.io#3595
[x] We will iterate on a sandbox url (registry.sandbox.k8s.io) for our experiments and ONLY promote things to (registry.k8s.io) when we have complete confidence
- [x] standup sandbox deployment oci-proxy#13
- [x] Bulid sandbox infrastructure for oci-proxy k8s.io#3317
  - [x] Auto deploy for archeo in staging to registry-sandbox.k8s.io k8s.io#3577
- [x] Full e2e testing oci-proxy#24
[x] both registry and registry.sandbox are serving traffic using oci-proxy on google cloud run
- Is this true now?
[x] oci-proxy will be updated to identify incoming traffic from AWS regions based on IP ranges so we can route traffic to s3 buckets in that region. If a specific AWS region do not currently host s3 buckets, we will redirect to the nearest region which does have s3 buckets (tradeoff between storage and network costs)
- [ ] [oci-proxy should detect per region (80% of total traffic)
  39](https://github.com/kubernetes-sigs/oci-proxy/issues/39#issuecomment-1085230292)
We will bulk sync existing image layers to these s3 layers as a starting point (from GCS/GCR)
- [x] kubernetes/k8s.io#3666
We will update image-promoter to push to these s3 buckets as well in addition to the current setup
- [ ] Update image-promoter - DRAFT
We will set up monitoring/reporting to check on new costs we incur on the AWS infrastructure and update what we do in GCP infrastructure as well to include the new components
- [ ] AWS Cost Monitoring - DRAFT
- [ ] Collection and loading of AWS cost explorer data kubernetes/k8s.io/infra/aws/aws-costexplorer-export
We will have a plan in place on how we could add additional AWS regions in the future
- [x] Document adding additional AWS Regions - kubernetes/k8s.io/infra/aws/terraform/registry.k8s.io/README.md#extending-regions
We will have CI jobs that will run against registry.sandbox as well to monitor stability before we promote code to registry
- [x] Full e2e testing using one parallel Conformance CI job - k-sigs/oci-proxy#24
- [x] Add e2e tests - oci-proxy/pr#32
- [x] Choose an e2e job to use registry-sandbox
- [x] Configure chosen e2e sandbox with a testgrid dashboard
We will automate the deployment/monitoring and testing of code landing in the oci-proxy repository
- [x] Auto deploy for archeo in staging to registry-sandbox.k8s.io · Issue #3577

stp-ip commented 3 years ago

Correct link as Github parsed wrong I guess: https://hackmd.io/@TKToYPauRJ-umNRBOh4HQ/HJBH3QF4

thockin commented 3 years ago

This finally forced me to disassemble the registry protocol a bit. Interesting. I picked a simple image I know:

$ curl -i https://k8s.gcr.io/v2/git-sync/git-sync/manifests/v3.2.2
HTTP/2 200 
docker-distribution-api-version: registry/2.0
content-type: application/vnd.docker.distribution.manifest.list.v2+json
content-length: 1670
docker-content-digest: sha256:6a543fb2d1e92008aad697da2672478dcfac715e3dddd33801d772da6e70cf24
date: Fri, 26 Mar 2021 22:20:30 GMT
server: Docker Registry
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:85d203d29623d5e7489751812d628e29d0e22075c94a2e99681ecf70be3977ad",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:31ba6a8e4f1aad8a9c42d97cac8752aaa0e4a92a5b2a3457e597020645fc6a0c",
         "platform": {
            "architecture": "arm",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:690188a4785caa356d2d98a806524f6f9aa4663a8c43be11fbd9dd5379a01fc9",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:21487b58352611e67ca033a96f59f1ba47f3e377f5f2e365961c35829bc68ff7",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

I picked the last blob:

$ curl -i https://k8s.gcr.io/v2/git-sync/git-sync/blobs/sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3
HTTP/2 302 
docker-distribution-api-version: registry/2.0
location: https://storage.googleapis.com/us.artifacts.k8s-artifacts-prod.appspot.com/containers/images/sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3
content-type: application/json
date: Fri, 26 Mar 2021 22:21:42 GMT
server: Docker Registry
cache-control: private
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
accept-ranges: none
vary: Accept-Encoding

{"errors":[]}

So maybe this is not so hard as I feared?

If we catch URLs of the form https://reg.k8s.io/v2/<name>/manifests/<tag> we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

Then we catch URLs of the form https://reg.k8s.io/v2/<name>/blobs/<digest> and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

What I don't know is what tools or public IP databases or other resources are available for the 2nd part. The more we can outsource, the better. But a proof-of-concept would be cool!

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2/<name>/manifests/<ref> from /v2/<name>/blobs/<digest> so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

I suspect that a model which requires providers to answer our DNS will be more difficult overall.

BobyMCbobs commented 3 years ago

Correct link as Github parsed wrong I guess: https://hackmd.io/@TKToYPauRJ-u_mNRBOh4HQ/HJBH3QF4_

@stp-ip, thank you. I've updated the description

justaugustus commented 3 years ago

Great to see this discussion happening!

A few things I'd like to see:

this reconciled with the existing artifact management KEP for @brendandburns + @justinsb: https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1732-artifact-management
this reconciled with the longer-term artifact management epic: https://github.com/kubernetes/sig-release/issues/1372 (currently assigned to me)
discussion about existing CNCF projects (like Harbor) as a potential solution: https://github.com/goharbor/harbor/issues/14411
an idea of intended assignees from the WG K8s Infra side (I'm on point for SIG Release)
feedback from @kubernetes/sig-release-leads @kubernetes/release-engineering

These discussions/decisions impact release delivery, so I'd really love to see them happening in venues where @kubernetes/release-managers are hanging out.

BobyMCbobs commented 3 years ago

@thockin, thank you for your input!

If we catch URLs of the form https://reg.k8s.io/v2/<name>/manifests/<tag> we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

I suspect that clients may be fine with redirects

Then we catch URLs of the form https://reg.k8s.io/v2/<name>/blobs/<digest> and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

I'm unsure the capability of Google CloudDNS or the use of LoadBalancers to achieve this, community-hosting it might be the option (I'm still investigating otherwise).

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2/<name>/manifests/<ref> from /v2/<name>/blobs/<digest> so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

BobyMCbobs commented 3 years ago

@justaugustus, appreciate your comments!

Great to see this discussion happening!

A few things I'd like to see:

this reconciled with the existing artifact management KEP for @brendandburns + @justinsb: https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1732-artifact-management

Thank you, I'll take a read of the KEP.

this reconciled with the longer-term artifact management epic: kubernetes/sig-release#1372 (currently assigned to me)

Totally [epic], I'll check it out as well

discussion about existing CNCF projects (like Harbor) as a potential solution: goharbor/harbor#14411

Absolutely! I've got the two proposals for either Distribution or Harbor. Both are wonderful pieces of software.

an idea of intended assignees from the WG K8s Infra side (I'm on point for SIG Release)

feedback from @kubernetes/sig-release-leads @kubernetes/release-engineering

That would be lovely!

These discussions/decisions impact release delivery, so I'd really love to see them happening in venues where @kubernetes/release-managers are hanging out.

I'll get in contact regarding this issue with folks. I look forward to coordinating a solution with ya'll :smiley:

thockin commented 3 years ago

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2//manifests/ from /v2//blobs/ so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

Either 302 redirect to blob.k8s.io which uses DNS split horizon (which requires the backends to host certs for that SAN) or 302 to blob.k8s.io which is code we host that does the GeoIP lookup, picks a best backend, and then 302s again to that backend. The advantage of the latter is that the backends don't need special certs.

If we can't coax the cloud LB to do this for us, it starts to look more like:

1) User pulls foo:bar 2) Client hits reg.k8s.io/v2/foo/manifests/bar 3) Receive that at a program we run (nginx or bespoke or ...) 4) Redirect to k8s.gcr.io/v2/foo/manifests/bar 5) Metadata fetched 6) Client hits /v2/foo/blobs/<digest> 7) Received at same program as step 3 8) GeoIP lookup, backend select 9) Redirect to <backend>/v2/foo/blobs/<digest> 10) Repeat steps 6-10 for each blob 11) Image is pulled

On Mon, Mar 29, 2021 at 12:39 PM Caleb Woodbine @.***> wrote:

@thockin https://github.com/thockin, thank you for your input!

If we catch URLs of the form https://reg.k8s.io/v2//manifests/ we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

I suspect that clients may be fine with redirects

Then we catch URLs of the form https://reg.k8s.io/v2//blobs/ and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

I'm unsure the capability of Google CloudDNS or the use of LoadBalancers to achieve this, community-hosting it might be the option (I'm still investigating otherwise).

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2//manifests/ from /v2//blobs/ so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/1834#issuecomment-809659020, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVBO573V2ZMDD27MCCLTGDJQTANCNFSM4Z2H3FUQ .

BobyMCbobs commented 3 years ago

Either 302 redirect to blob.k8s.io which uses DNS split horizon (which requires the backends to host certs for that SAN) or 302 to blob.k8s.io which is code we host that does the GeoIP lookup, picks a best backend, and then 302s again to that backend. The advantage of the latter is that the backends don't need special certs.

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

If we can't coax the cloud LB to do this for us, it starts to look more like: 1) User pulls foo:bar 2) Client hits reg.k8s.io/v2/foo/manifests/bar 3) Receive that at a program we run (nginx or bespoke or ...) 4) Redirect to k8s.gcr.io/v2/foo/manifests/bar 5) Metadata fetched 6) Client hits /v2/foo/blobs/<digest> 7) Received at same program as step 3 8) GeoIP lookup, backend select 9) Redirect to <backend>/v2/foo/blobs/<digest> 10) Repeat steps 6-10 for each blob 11) Image is pulled

This is a really clear flow!

@thockin, thank you!

rikatz commented 3 years ago

Can I do an attempt into Cloud Run and check if instead we run a machine, running a function that does the redirect wouldn't be cheaper (probably not!) and better? :D

Edit: @justinsb was fair enough saying that we probably can run this inside the AAA and have not so much problems as well, so yeah, let's see how we can use a redirector inside Kubernetes

hh commented 3 years ago

I've been searching out a few ASNs for larger cloud providers that likely hit our existing infra. Once we use these to understand which providers are costing the CNCF the most, we can approach to redirect to a local solution. If anyone from these providers wants to help narrow down which ASNs are part of their cloud offerings, that would help.

Microsoft
Amazon
Google
- https://www.peeringdb.com/org/574
- https://bgp.he.net/AS139190#_prefixes Google Cloud Indonesia
- https://bgp.he.net/AS139070#_prefixes Google Cloud Korea
- https://bgp.he.net/AS45566#_prefixes Google Corporate Network in APAC
- https://bgp.he.net/AS15169#_prefixes Google LLC
- https://bgp.he.net/AS19527#_prefixes Google LLC AS19527
- https://bgp.he.net/AS36040#_prefixes Google LLC AS36040
- https://bgp.he.net/AS43515#_prefixes Google LLC AS43515
- https://bgp.he.net/AS16550#_prefixes Google Private Cloud
Alibaba Group
Tencent Cloud
- https://bgp.he.net/AS132591#_prefixes Tencent Cloud
- https://bgp.he.net/AS132203#_prefixes Tencent Global
- https://bgp.he.net/AS45090#_prefixes Tencent-CN
Equinox
- https://www.peeringdb.com/org/2
- Very long list of ASNs...

stp-ip commented 3 years ago

Digital Ocean
- https://www.peeringdb.com/net/6494
- https://bgp.he.net/AS14061#_prefixes
Huawei
- https://www.peeringdb.com/net/18382
- https://bgp.he.net/AS136907#_prefixes
Baidu
- https://www.peeringdb.com/net/6517
- https://bgp.he.net/AS55967#_prefixes

There are a few other providers that could result in traffic, but above felt like a good additional selection of the bigger ones. Full reference of a list of providers: https://docs.google.com/spreadsheets/d/1LxSqBzjOxfGx3cmtZ4EbB_BGCxT_wlxW_xgHVVa23es/edit#gid=0

So depending on how much traffic is done by those above we could always dig deeper. Let's see what the stats say for the listed providers and then happy to dig into the smaller providers.

BobyMCbobs commented 3 years ago

I believe this is the list of ASNs for Equinix Metal:

- 8545 - 9989 - 12085 - 12188 - 14609 - 15734 - 15830 - 15830 - 15830 - 15830 - 15830 - 15830 - 15830 - 16243 - 16397 - 16553 - 17819 - 17941 - 19930 - 21371 - 23637 - 23686 - 24115 - 24121 - 24989 - 24990 - 26592 - 27224 - 27272 - 27330 - 27566 - 29154 - 29884 - 32323 - 32550 - 34209 - 35054 - 43147 - 47886 - 47886 - 54588 - 54825 - 62421 - 64275 - 137840 - 139281 - 264220 - 265376 - 266849 - 270119 - 394749

BobyMCbobs commented 3 years ago

ASNs in k8s.io repo: https://github.com/kubernetes/k8s.io/issues/1914

thockin commented 3 years ago

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

Yes. My thinking is mostly around TLS - if we do split horizon, the real backends have to offer certs for our names. If we 302, they do not. There are a number of GeoIP libs for Go that could be viable. Other than that, the logic seems simple enough to prototype. We could throw it into the aaa cluster as a quick test.

BobyMCbobs commented 3 years ago

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

Yes. My thinking is mostly around TLS - if we do split horizon, the real backends have to offer certs for our names. If we 302, they do not. There are a number of GeoIP libs for Go that could be viable. Other than that, the logic seems simple enough to prototype. We could throw it into the aaa cluster as a quick test.

Thank you @thockin for your comments.

Regarding using a service to perform a redirect, the behavour of something like docker pull registry.k8s.io/{{.Image}}:

it will pull tagging the image as registry.k8s.io/{{.Image}}, not what it is at the container registry
due to Envoy being exposed with an Ingress host that has TLS, it appears to not matter about the TLS at the actual container registry

ref: https://ii.coop/blog/rerouting-container-registries-with-envoy/#the-implementation

justinsb commented 3 years ago

We could throw it into the aaa cluster as a quick test.

Do you mean deploying https://github.com/kubernetes/k8s.io/tree/main/artifactserver as a test?

BobyMCbobs commented 3 years ago

I deployed Envoy as well as Distribution on a cluster in the k8s-infra-ii-sandbox project from this Org file https://github.com/cncf-infra/prow-config/blob/master/infra/gcp/README.org#envoy

justinsb commented 3 years ago

@BobyMCbobs can we try deploying artifactserver as well?

BobyMCbobs commented 3 years ago

@BobyMCbobs can we try deploying artifactserver as well?

Yes! I've deployed it to https://artifacts.ii-sandbox.bobymcbobs-oitq.pair.sharing.io at the moment https://github.com/cncf-infra/prow-config/blob/dc681e5d79d85af47df5f01ebcf281bf193de666/infra/gcp/README.org#artifactserver

I am currently trying to adapt the source to provide the same 302 functionality as what Envoy is providing.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

puerco commented 3 years ago

/remove-lifecycle stale

spiffxp commented 3 years ago

/lifecycle frozen

spiffxp commented 3 years ago

I'm closing a number of WIP or held PRs that don't seem intended for merge, but were merely open for illustrative purposes. Linking them here in case folks are still using them for reference or want to reopen them:

hh commented 2 years ago

I think we've closed a few approaches here, but we should probably make a call between pushing one of these two forward OR looking at a fully hosted solution from Google, Amazon, or someone else. I think the main thing here is I'm reluctant for us to use a solution for distributing Kubernetes that doesn't have a full-time on-call team behind it... somewhere.

WIP LoadTesting for the two prototypes

Envoy

Early Prototype, for inline LUA+Config

Later Prototype, for eventual WASM filter

The lua envoy_on_request() config/function reaches out (via an CDS service configured here) this go-backend with a very simple business logic for now.

This would eventually be replaced by a wasm filter.

ArtifactServer

Software written by JustinSB, which was slightly updated be more generic and base the redirects on a configuration file.

Deployment Testing Process

Link to CLOSED k8s.io/artifactserver PR#2068

hh commented 2 years ago

/assign @thockin

spiffxp commented 2 years ago

Pulling out of when this was asked in slack last week.

I want to see a proposal sort of doc or presentation that lets us evaluate our alternatives against a consistent set of criteria/dimensions.

how much is this going to save us?
- for consumers outside of google cloud (e.g. save on "general network usage" costs listed in https://cloud.google.com/storage/pricing)
- for consumers within google cloud (e.g. save on "network egress within google cloud" listed in https://cloud.google.com/storage/pricing)
do we have to run it?
- if so how will we do that?
what request volume does it need to handle?
- what request volume tips it over?
how will we know what it's doing?
- logging story
- monitoring story
- handling PII
if we were to choose this alternative, what is the step by step process to execute on this decision? including
- whether and how we orchestrate Yet Another Rename of images
- what rollback looks like at each step of the process

Reducing cost is my primary concern here. Whether that is accomplished by farming out requests from large consumers to mirrors within their networks, or serving traffic from our single solution more cheaply... I don't have a preference.

I am not sure whether any time has been put into investigating whether hosting on a CDN could improve our costs. My rough back-of-the-napkin math, looking at the difference between https://cloud.google.com/storage/pricing and https://cloud.google.com/cdn/pricing, says that if we could magically serve all our traffic through Cloud CDN, we'd be saving 40-50% of our artifact hosting costs if were to continue serving >1TB of data per month

I don't know whether it's possible to serve GCR (or artifact registry) through Cloud CDN, but I think that's enough of a difference to merit a look. Has your team looked into this or other CDN alternatives at all?

thockin commented 2 years ago

I have not been able to make a lot of time for this, but I have a bit now.

do we have to run it?

I don't think I see a way not to.

if so how will we do that?

We already run k8s.io and friends, though that is a much lower traffic thing. We'll need to set up a volunteer army.

what request volume does it need to handle?

We can look at average QPS for current GCR and extrapolate - @hh do you have that data nearby?

what request volume tips it over?

We'll need load-testing to pull this off.

logging story monitoring story handling PII

yes

In addition to your questions:

How do we push / pull updates to mirrors?
How do we govern the set of mirrors?
- which ones do we list?
- how do we check health?
- how do we decide if/when to allow another one?
- how transparent must the process be (e.g. AWS owns it or k8s own it)?
- how do we ensure integrity?
Where do we get the Geo-IP mapping (or ASN mapping) and how do we keep it up to date?

thockin commented 2 years ago

WRT tech stack:

I took this program:

package main

import (
    "log"
    "net/http"
    "os"
    "regexp"
    "strings"
)

func main() {
    port := os.Getenv("PORT")
    if port == "" {
        port = "8080"
    }
    log.Printf("listening on port %s", port)
    http.ListenAndServe(":"+port, http.HandlerFunc(handler))
}

func handler(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path
    switch {
    case strings.HasPrefix(path, "/v2/"):
        doV2(w, r)
    case strings.HasPrefix(path, "/v1/"):
        doV1(w, r)
    default:
        log.Printf("unknown request: %q", path)
        http.NotFound(w, r)
    }
}

var reBlob = regexp.MustCompile("^/v2/.*/blobs/sha256:[0-9a-f]{64}$")

func doV2(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path

    if reBlob.MatchString(path) {
        // Blob requests are the fun ones.
        log.Printf("v2 blob request: %q", path)
        //FIXME: look up the best backend
        http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusTemporaryRedirect)
        return
    }

    // Anything else (manifests in particular) go to the canonical registry.
    log.Printf("v2 request: %q", path)
    http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusPermanentRedirect)
}

func doV1(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path
    log.Printf("v1 request: %q", path)
    //FIXME: look up backend?
    http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusPermanentRedirect)
}

...and it acts as a proxy to k8s.gcr.io for docker pull. We can run it in a GKE cluster (or in several around the world). But seeing how trivial this is, there has to be a better way.

So I put it into Cloud Run. Easy. My test project is locked down (org policy, yay), so I can't point you at it, but easy to replicate.

It seems possible to add multiple global backends: https://cloud.google.com/run/docs/multiple-regions

So what are we missing:

logic to figure out "best" backends
the stuff listed above about productionizing it.

How do we make progress on that?

aojea commented 2 years ago

/cc

BenTheElder commented 2 years ago

see: https://docs.google.com/document/d/1yNQ7DaDE5LbDJf9ku82YtlKZK0tcg5Wpk9L72-x2S2k/ (shared with dev@kubernetes.io mailinglist and the SIG mailing list) for some recent discussion on this topic.

BobyMCbobs commented 2 years ago

Update 📰 🎉

The redirecting from registry.k8s.io to k8s.gcr.io and prod-registry-k8s-io-$REGION.s3.dualstack.us-east-2.amazonaws.com is instantiated and there is automated replication between the buckets. There is a registry-sandbox.k8s.io, for staging, including an auto-deploy from main. The staging environment is also used in CI jobs. The repo for the redirector is available at https://github.com/kubernetes/registry.k8s.io. It has been a huge effort with collaborations between many folks in sig-k8s-infra and sig-release.

cc @kubernetes/sig-k8s-infra

BenTheElder commented 1 year ago

I think we can close this.

This is at https://registry.k8s.io now and is generally implemented.

What remains is phasing over users, which we're tracking elsewhere.

BenTheElder commented 1 year ago

/close

k8s-ci-robot commented 1 year ago

@BenTheElder: Closing this issue.

In response to [this](https://github.com/kubernetes/k8s.io/issues/1834#issuecomment-1472752031): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / k8s.io