kubernetes / k8s.io

Code and configuration to manage Kubernetes project infrastructure, including various *.k8s.io sites
https://git.k8s.io/community/sig-k8s-infra
Apache License 2.0
717 stars 803 forks source link

Umbrella issue: k8s.gcr.io => registry.k8s.io solution #1834

Closed BobyMCbobs closed 1 year ago

BobyMCbobs commented 3 years ago

Umbrella issue: k8s.gcr.io => registry.k8s.io solution #1834

This markdown is synced from https://hackmd.io/gN-1GeSpSgyNSvmjKSULbg?edit to https://github.com/kubernetes/k8s.io/issues/1834#issue-841372237 manually by @BobyMCBobs

Scope: https://github.com/kubernetes/k8s.io/wiki/New-Registry-url-for-Kubernetes-(registry.k8s.io)

Design Doc: https://docs.google.com/document/d/1yNQ7DaDE5LbDJf9ku82YtlKZK0tcg5Wpk9L72-x2S2k/edit (shared w/ dev@kubernetes.io and SIG mailing list)

Board: https://github.com/orgs/kubernetes/projects/77

DRAFT AIs that need filled turned into tickets: https://github.com/orgs/kubernetes/projects/77/views/2?filterQuery=is%3Adraft

What exactly are you doing? (and how?)

stp-ip commented 3 years ago

Correct link as Github parsed wrong I guess: https://hackmd.io/@TKToYPauRJ-umNRBOh4HQ/HJBH3QF4

thockin commented 3 years ago

This finally forced me to disassemble the registry protocol a bit. Interesting. I picked a simple image I know:

$ curl -i https://k8s.gcr.io/v2/git-sync/git-sync/manifests/v3.2.2
HTTP/2 200 
docker-distribution-api-version: registry/2.0
content-type: application/vnd.docker.distribution.manifest.list.v2+json
content-length: 1670
docker-content-digest: sha256:6a543fb2d1e92008aad697da2672478dcfac715e3dddd33801d772da6e70cf24
date: Fri, 26 Mar 2021 22:20:30 GMT
server: Docker Registry
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:85d203d29623d5e7489751812d628e29d0e22075c94a2e99681ecf70be3977ad",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:31ba6a8e4f1aad8a9c42d97cac8752aaa0e4a92a5b2a3457e597020645fc6a0c",
         "platform": {
            "architecture": "arm",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:690188a4785caa356d2d98a806524f6f9aa4663a8c43be11fbd9dd5379a01fc9",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:21487b58352611e67ca033a96f59f1ba47f3e377f5f2e365961c35829bc68ff7",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1572,
         "digest": "sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3",
         "platform": {
            "architecture": "s390x",
            "os": "linux"
         }
      }
   ]
}

I picked the last blob:

$ curl -i https://k8s.gcr.io/v2/git-sync/git-sync/blobs/sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3
HTTP/2 302 
docker-distribution-api-version: registry/2.0
location: https://storage.googleapis.com/us.artifacts.k8s-artifacts-prod.appspot.com/containers/images/sha256:41f3ac440284018ce19b78a8e39a3e99c701a6d7c90fdf7204e180a9715ca7e3
content-type: application/json
date: Fri, 26 Mar 2021 22:21:42 GMT
server: Docker Registry
cache-control: private
x-xss-protection: 0
x-frame-options: SAMEORIGIN
alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
accept-ranges: none
vary: Accept-Encoding

{"errors":[]}

So maybe this is not so hard as I feared?

If we catch URLs of the form https://reg.k8s.io/v2/<name>/manifests/<tag> we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

Then we catch URLs of the form https://reg.k8s.io/v2/<name>/blobs/<digest> and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

What I don't know is what tools or public IP databases or other resources are available for the 2nd part. The more we can outsource, the better. But a proof-of-concept would be cool!

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2/<name>/manifests/<ref> from /v2/<name>/blobs/<digest> so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

I suspect that a model which requires providers to answer our DNS will be more difficult overall.

BobyMCbobs commented 3 years ago

Correct link as Github parsed wrong I guess: https://hackmd.io/@TKToYPauRJ-u_mNRBOh4HQ/HJBH3QF4_

@stp-ip, thank you. I've updated the description

justaugustus commented 3 years ago

Great to see this discussion happening!

A few things I'd like to see:

These discussions/decisions impact release delivery, so I'd really love to see them happening in venues where @kubernetes/release-managers are hanging out.

BobyMCbobs commented 3 years ago

@thockin, thank you for your input!

If we catch URLs of the form https://reg.k8s.io/v2/<name>/manifests/<tag> we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

I suspect that clients may be fine with redirects

Then we catch URLs of the form https://reg.k8s.io/v2/<name>/blobs/<digest> and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

I'm unsure the capability of Google CloudDNS or the use of LoadBalancers to achieve this, community-hosting it might be the option (I'm still investigating otherwise).

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2/<name>/manifests/<ref> from /v2/<name>/blobs/<digest> so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

BobyMCbobs commented 3 years ago

@justaugustus, appreciate your comments!

Great to see this discussion happening!

A few things I'd like to see:

Thank you, I'll take a read of the KEP.

Totally [epic], I'll check it out as well

Absolutely! I've got the two proposals for either Distribution or Harbor. Both are wonderful pieces of software.

  • an idea of intended assignees from the WG K8s Infra side (I'm on point for SIG Release)
  • feedback from @kubernetes/sig-release-leads @kubernetes/release-engineering

That would be lovely!

These discussions/decisions impact release delivery, so I'd really love to see them happening in venues where @kubernetes/release-managers are hanging out.

I'll get in contact regarding this issue with folks. I look forward to coordinating a solution with ya'll :smiley:

thockin commented 3 years ago

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2//manifests/ from /v2//blobs/ so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

Either 302 redirect to blob.k8s.io which uses DNS split horizon (which requires the backends to host certs for that SAN) or 302 to blob.k8s.io which is code we host that does the GeoIP lookup, picks a best backend, and then 302s again to that backend. The advantage of the latter is that the backends don't need special certs.

If we can't coax the cloud LB to do this for us, it starts to look more like:

1) User pulls foo:bar 2) Client hits reg.k8s.io/v2/foo/manifests/bar 3) Receive that at a program we run (nginx or bespoke or ...) 4) Redirect to k8s.gcr.io/v2/foo/manifests/bar 5) Metadata fetched 6) Client hits /v2/foo/blobs/<digest> 7) Received at same program as step 3 8) GeoIP lookup, backend select 9) Redirect to <backend>/v2/foo/blobs/<digest> 10) Repeat steps 6-10 for each blob 11) Image is pulled

On Mon, Mar 29, 2021 at 12:39 PM Caleb Woodbine @.***> wrote:

@thockin https://github.com/thockin, thank you for your input!

If we catch URLs of the form https://reg.k8s.io/v2//manifests/ we can redirect those to k8s.gcr.io (which has global replicas) or some other "canonical" source for metadata. I don't know if docker clients would trip over a literal redirect or what, so worst case we'd have to proxy that data (yuck).

I suspect that clients may be fine with redirects

Then we catch URLs of the form https://reg.k8s.io/v2//blobs/ and redirect to one of the backends. As you point out, we have to do the split-horizon (geo IP) ourselves (yuck).

I'm unsure the capability of Google CloudDNS or the use of LoadBalancers to achieve this, community-hosting it might be the option (I'm still investigating otherwise).

I spent a bit of time trying to coax the Google cloud LB to distinguish /v2//manifests/ from /v2//blobs/ so the 1st part could simply be a cloud LB rule. Alas it only matches on prefixes. It might be possible to use Content-Type or Accept headers to tell the difference (suggested: match Accept header with blob mime type). If we could do that, then the only thing we'd have to own would be the 2nd part.

This would mean declaring a rule to rewrite the URL that redirects to a DNS host that uses split-horizon DNS which will then go to a blobs server at the nearest cloud provider?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/1834#issuecomment-809659020, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVBO573V2ZMDD27MCCLTGDJQTANCNFSM4Z2H3FUQ .

BobyMCbobs commented 3 years ago

Either 302 redirect to blob.k8s.io which uses DNS split horizon (which requires the backends to host certs for that SAN) or 302 to blob.k8s.io which is code we host that does the GeoIP lookup, picks a best backend, and then 302s again to that backend. The advantage of the latter is that the backends don't need special certs.

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

If we can't coax the cloud LB to do this for us, it starts to look more like: 1) User pulls foo:bar 2) Client hits reg.k8s.io/v2/foo/manifests/bar 3) Receive that at a program we run (nginx or bespoke or ...) 4) Redirect to k8s.gcr.io/v2/foo/manifests/bar 5) Metadata fetched 6) Client hits /v2/foo/blobs/<digest> 7) Received at same program as step 3 8) GeoIP lookup, backend select 9) Redirect to <backend>/v2/foo/blobs/<digest> 10) Repeat steps 6-10 for each blob 11) Image is pulled

This is a really clear flow!

@thockin, thank you!

rikatz commented 3 years ago

Can I do an attempt into Cloud Run and check if instead we run a machine, running a function that does the redirect wouldn't be cheaper (probably not!) and better? :D

Edit: @justinsb was fair enough saying that we probably can run this inside the AAA and have not so much problems as well, so yeah, let's see how we can use a redirector inside Kubernetes

hh commented 3 years ago

I've been searching out a few ASNs for larger cloud providers that likely hit our existing infra. Once we use these to understand which providers are costing the CNCF the most, we can approach to redirect to a local solution. If anyone from these providers wants to help narrow down which ASNs are part of their cloud offerings, that would help.

stp-ip commented 3 years ago

There are a few other providers that could result in traffic, but above felt like a good additional selection of the bigger ones. Full reference of a list of providers: https://docs.google.com/spreadsheets/d/1LxSqBzjOxfGx3cmtZ4EbB_BGCxT_wlxW_xgHVVa23es/edit#gid=0

So depending on how much traffic is done by those above we could always dig deeper. Let's see what the stats say for the listed providers and then happy to dig into the smaller providers.

BobyMCbobs commented 3 years ago

I believe this is the list of ASNs for Equinix Metal:

- 8545 - 9989 - 12085 - 12188 - 14609 - 15734 - 15830 - 15830 - 15830 - 15830 - 15830 - 15830 - 15830 - 16243 - 16397 - 16553 - 17819 - 17941 - 19930 - 21371 - 23637 - 23686 - 24115 - 24121 - 24989 - 24990 - 26592 - 27224 - 27272 - 27330 - 27566 - 29154 - 29884 - 32323 - 32550 - 34209 - 35054 - 43147 - 47886 - 47886 - 54588 - 54825 - 62421 - 64275 - 137840 - 139281 - 264220 - 265376 - 266849 - 270119 - 394749
BobyMCbobs commented 3 years ago

ASNs in k8s.io repo: https://github.com/kubernetes/k8s.io/issues/1914

thockin commented 3 years ago

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

Yes. My thinking is mostly around TLS - if we do split horizon, the real backends have to offer certs for our names. If we 302, they do not. There are a number of GeoIP libs for Go that could be viable. Other than that, the logic seems simple enough to prototype. We could throw it into the aaa cluster as a quick test.

BobyMCbobs commented 3 years ago

Would you say that a small webserver to do 302 redirects may be easier or more maintainable than split-horizon?

Yes. My thinking is mostly around TLS - if we do split horizon, the real backends have to offer certs for our names. If we 302, they do not. There are a number of GeoIP libs for Go that could be viable. Other than that, the logic seems simple enough to prototype. We could throw it into the aaa cluster as a quick test.

Thank you @thockin for your comments.

Regarding using a service to perform a redirect, the behavour of something like docker pull registry.k8s.io/{{.Image}}:

ref: https://ii.coop/blog/rerouting-container-registries-with-envoy/#the-implementation

justinsb commented 3 years ago

We could throw it into the aaa cluster as a quick test.

Do you mean deploying https://github.com/kubernetes/k8s.io/tree/main/artifactserver as a test?

BobyMCbobs commented 3 years ago

related: https://github.com/kubernetes/k8s.io/issues/1758

BobyMCbobs commented 3 years ago

I deployed Envoy as well as Distribution on a cluster in the k8s-infra-ii-sandbox project from this Org file https://github.com/cncf-infra/prow-config/blob/master/infra/gcp/README.org#envoy

justinsb commented 3 years ago

@BobyMCbobs can we try deploying artifactserver as well?

BobyMCbobs commented 3 years ago

@BobyMCbobs can we try deploying artifactserver as well?

Yes! I've deployed it to https://artifacts.ii-sandbox.bobymcbobs-oitq.pair.sharing.io at the moment https://github.com/cncf-infra/prow-config/blob/dc681e5d79d85af47df5f01ebcf281bf193de666/infra/gcp/README.org#artifactserver

I am currently trying to adapt the source to provide the same 302 functionality as what Envoy is providing.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

puerco commented 3 years ago

/remove-lifecycle stale

spiffxp commented 3 years ago

/lifecycle frozen

spiffxp commented 3 years ago

I'm closing a number of WIP or held PRs that don't seem intended for merge, but were merely open for illustrative purposes. Linking them here in case folks are still using them for reference or want to reopen them:

hh commented 2 years ago

I think we've closed a few approaches here, but we should probably make a call between pushing one of these two forward OR looking at a fully hosted solution from Google, Amazon, or someone else. I think the main thing here is I'm reluctant for us to use a solution for distributing Kubernetes that doesn't have a full-time on-call team behind it... somewhere.

WIP LoadTesting for the two prototypes

Envoy

Early Prototype, for inline LUA+Config

Later Prototype, for eventual WASM filter

The lua envoy_on_request() config/function reaches out (via an CDS service configured here) this go-backend with a very simple business logic for now.

This would eventually be replaced by a wasm filter.

ArtifactServer

Software written by JustinSB, which was slightly updated be more generic and base the redirects on a configuration file.

Deployment Testing Process

Link to CLOSED k8s.io/artifactserver PR#2068

hh commented 2 years ago

/assign @thockin

spiffxp commented 2 years ago

Pulling out of when this was asked in slack last week.

I want to see a proposal sort of doc or presentation that lets us evaluate our alternatives against a consistent set of criteria/dimensions.

Reducing cost is my primary concern here. Whether that is accomplished by farming out requests from large consumers to mirrors within their networks, or serving traffic from our single solution more cheaply... I don't have a preference.

I am not sure whether any time has been put into investigating whether hosting on a CDN could improve our costs. My rough back-of-the-napkin math, looking at the difference between https://cloud.google.com/storage/pricing and https://cloud.google.com/cdn/pricing, says that if we could magically serve all our traffic through Cloud CDN, we'd be saving 40-50% of our artifact hosting costs if were to continue serving >1TB of data per month

I don't know whether it's possible to serve GCR (or artifact registry) through Cloud CDN, but I think that's enough of a difference to merit a look. Has your team looked into this or other CDN alternatives at all?

thockin commented 2 years ago

I have not been able to make a lot of time for this, but I have a bit now.

do we have to run it?

I don't think I see a way not to.

if so how will we do that?

We already run k8s.io and friends, though that is a much lower traffic thing. We'll need to set up a volunteer army.

what request volume does it need to handle?

We can look at average QPS for current GCR and extrapolate - @hh do you have that data nearby?

what request volume tips it over?

We'll need load-testing to pull this off.

logging story monitoring story handling PII

yes

In addition to your questions:

thockin commented 2 years ago

WRT tech stack:

I took this program:

package main

import (
    "log"
    "net/http"
    "os"
    "regexp"
    "strings"
)

func main() {
    port := os.Getenv("PORT")
    if port == "" {
        port = "8080"
    }
    log.Printf("listening on port %s", port)
    http.ListenAndServe(":"+port, http.HandlerFunc(handler))
}

func handler(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path
    switch {
    case strings.HasPrefix(path, "/v2/"):
        doV2(w, r)
    case strings.HasPrefix(path, "/v1/"):
        doV1(w, r)
    default:
        log.Printf("unknown request: %q", path)
        http.NotFound(w, r)
    }
}

var reBlob = regexp.MustCompile("^/v2/.*/blobs/sha256:[0-9a-f]{64}$")

func doV2(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path

    if reBlob.MatchString(path) {
        // Blob requests are the fun ones.
        log.Printf("v2 blob request: %q", path)
        //FIXME: look up the best backend
        http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusTemporaryRedirect)
        return
    }

    // Anything else (manifests in particular) go to the canonical registry.
    log.Printf("v2 request: %q", path)
    http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusPermanentRedirect)
}

func doV1(w http.ResponseWriter, r *http.Request) {
    path := r.URL.Path
    log.Printf("v1 request: %q", path)
    //FIXME: look up backend?
    http.Redirect(w, r, "https://k8s.gcr.io"+path, http.StatusPermanentRedirect)
}

...and it acts as a proxy to k8s.gcr.io for docker pull. We can run it in a GKE cluster (or in several around the world). But seeing how trivial this is, there has to be a better way.

So I put it into Cloud Run. Easy. My test project is locked down (org policy, yay), so I can't point you at it, but easy to replicate.

It seems possible to add multiple global backends: https://cloud.google.com/run/docs/multiple-regions

So what are we missing:

How do we make progress on that?

aojea commented 2 years ago

/cc

BenTheElder commented 2 years ago

see: https://docs.google.com/document/d/1yNQ7DaDE5LbDJf9ku82YtlKZK0tcg5Wpk9L72-x2S2k/ (shared with dev@kubernetes.io mailinglist and the SIG mailing list) for some recent discussion on this topic.

BobyMCbobs commented 2 years ago

Update 📰 🎉

The redirecting from registry.k8s.io to k8s.gcr.io and prod-registry-k8s-io-$REGION.s3.dualstack.us-east-2.amazonaws.com is instantiated and there is automated replication between the buckets. There is a registry-sandbox.k8s.io, for staging, including an auto-deploy from main. The staging environment is also used in CI jobs. The repo for the redirector is available at https://github.com/kubernetes/registry.k8s.io. It has been a huge effort with collaborations between many folks in sig-k8s-infra and sig-release.

cc @kubernetes/sig-k8s-infra

BenTheElder commented 1 year ago

I think we can close this.

This is at https://registry.k8s.io now and is generally implemented.

What remains is phasing over users, which we're tracking elsewhere.

BenTheElder commented 1 year ago

/close

k8s-ci-robot commented 1 year ago

@BenTheElder: Closing this issue.

In response to [this](https://github.com/kubernetes/k8s.io/issues/1834#issuecomment-1472752031): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.