kubernetes / registry.k8s.io

This project is the repo for registry.k8s.io, the production OCI registry service for Kubernetes' container image artifacts
https://registry.k8s.io
Apache License 2.0
388 stars 68 forks source link

Random download failures - 403 errors [hetzner] #138

Closed marblerun closed 1 year ago

marblerun commented 1 year ago

Hi,

Attepting to build a 3 node kubernetes cluster, using kubespray (latest) on hetzner cloud instances running Debian 11.

First attempt failed due to download failure for kubeadm on 1 of the 3 instances. Confirmed using local download, 1 fail, 2 sucess. Swapped in a replacement instance, and moved past this point, assumed possible ip blacklisting, though not confirmed.

All 3 instances then downloaded 4 calico networking containers, and came to the pause 3.7 download, using a command like this.

root@kube-3:~# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7 root@kube-3:~# nerdctl images REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE registry.k8s.io/pause 3.7 bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB

bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB on the failing instance, we see the following error if applied by hand, using kubespray it tries 4 times and then fails the whole install at that point. root@kube-2:~# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7 FATA[0000] failed to resolve reference "registry.k8s.io/pause:3.7": unexpected status from HEAD request to https://registry.k8s.io/v2/pause/manifests/3.7: 403 Forbidden Do you have any idea why the download from this registry might be failing, and is there any alternative source I could try ? The ip address starts and ends as shown below, and was run a couple of minutes ago Thu 12 Jan 2023 02:52:21 PM UTC 65.x.x.244 Many thanks Mike
BenTheElder commented 1 year ago

That endpoint works fine from here

$ curl -IL https://registry.k8s.io/v2/pause/manifests/3.7
HTTP/2 307 
content-type: text/html; charset=utf-8
location: https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.7
x-cloud-trace-context: 9e8f3405a102bf4332d81593461d200a
date: Thu, 12 Jan 2023 20:13:55 GMT
server: Google Frontend
via: 1.1 google
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000

HTTP/2 200 
content-length: 2761
content-type: application/vnd.docker.distribution.manifest.list.v2+json
docker-content-digest: sha256:bb6ed397957e9ca7c65ada0db5c5d1c707c9c8afc80a94acbe69f3ae76988f0c
docker-distribution-api-version: registry/2.0
date: Thu, 12 Jan 2023 20:13:55 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

Is there a proxy involved?

Can nerdctl produce more verbose results? That path should have served a redirect to some other backend.

BenTheElder commented 1 year ago

We don't even have code to serve 403 in the registry.k8s.io application, so that would be coming from the backing store we redirect to, but from the logs above we can't see that part.

marblerun commented 1 year ago

Thanks Ben,

As a temp fix, I've looked at the kubespray logs, downloaded the missing elements on a working instance, then exported them to a local file, copied over and imported them back into the instance that is being blocked. I now have a working cluster, but it is concerning that access seems to being blocked in an arbitary fashion. Have a good weekend

Mike

tcahill commented 1 year ago

I'm seeing the same behavior in a similar context. I'm trying to install the kube-prometheus-stack helm chart on a k3s cluster in Hetzner Cloud (hosted in their Oregon location) and getting a 403 when pulling registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0. Interestingly I'm only seeing this behavior on one of the three hosts comprising my cluster, which are all running Ubuntu 22. It's also not consistent on the problematic host - I occasionally get a successful response but primarily see 403s.

We don't even have code to serve 403 in the registry.k8s.io application

For me the 403 is appearing without following the redirect:

curl -v https://registry.k8s.io/v2/pause/manifests/3.7
*   Trying 34.107.244.51:443...
* Connected to registry.k8s.io (34.107.244.51) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* [CONN-0-0][CF-SSL] TLSv1.0 (OUT), TLS header, Certificate Status (22):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Certificate Status (22):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=registry.k8s.io
*  start date: Dec 31 01:52:06 2022 GMT
*  expire date: Mar 31 02:44:39 2023 GMT
*  subjectAltName: host "registry.k8s.io" matched cert's "registry.k8s.io"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* h2h3 [:method: GET]
* h2h3 [:path: /v2/pause/manifests/3.7]
* h2h3 [:scheme: https]
* h2h3 [:authority: registry.k8s.io]
* h2h3 [user-agent: curl/7.87.0]
* h2h3 [accept: */*]
* Using Stream ID: 1 (easy handle 0x7f356982fa90)
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /v2/pause/manifests/3.7 HTTP/2
> Host: registry.k8s.io
> user-agent: curl/7.87.0
> accept: */*
> 
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 403 
< content-type: text/html; charset=UTF-8
< referrer-policy: no-referrer
< content-length: 317
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
< 
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/pause/manifests/3.7</code> from this server.</h2>
<h2></h2>
</body></html>
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Connection #0 to host registry.k8s.io left intact
BenTheElder commented 1 year ago

Thanks for the additional logs.

cc @ameukam maybe cloud armor? I forgot about that dimension in the actual deployment.

This definitely looks like it's coming from the infra in front of the app, we also don't serve HTML, only redirects (or simple API errors).

BenTheElder commented 1 year ago

@ameukam and I discussed this yesterday.

This appears to be coming from the cloud loadbalancer security policy (we're using cloud armor, configured here: https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

I don't think we're doing anything special here, best guess is hetzner IPs have been flagged for abuse?

I actually can't seem to find these particular requests in the loadbalancer logs, otherwise we could see what preconfigured rule this is hitting.

BenTheElder commented 1 year ago

I can see other 403s served by the security policy for more obviously problematic incoming requests like https://registry.k8s.io/?../../../../../../../../../../../etc/profile

mysticaltech commented 1 year ago

Folks, I can confirm this issue shows randomly when pulling CSI images. It seems that some IPs are blacklisted or something!

This has been a huge issue this last month for us! It started in late December.

mysticaltech commented 1 year ago

@ameukam and I discussed this yesterday.

This appears to be coming from the cloud loadbalancer security policy (we're using cloud armor, configured here: https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

I don't think we're doing anything special here, best guess is hetzner IPs have been flagged for abuse?

I actually can't seem to find these particular requests in the loadbalancer logs, otherwise we could see what preconfigured rule this is hitting.

That would make absolute sense! Somehow, some Hetzner IPs seem to be blacklisted. For our Kube-Hetzner project, it's been a real pain. Please fix 🙏

https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/524 https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/451 https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/442

dims commented 1 year ago

@mysticaltech can you please drop a few ip address(es) of boxes that seem to have trouble?

mysticaltech commented 1 year ago

@dims Definitely, I can try to get some.

@aleksasiriski Could you fetch some of the 10 IPs that you had reserved as static IPs because they were blocked by registry.k8s.io when used for nodes?

mysticaltech commented 1 year ago

@dims I just deployed a test cluster of 10 nodes, and got "lucky" on one of them. The one affected IP is 5.75.240.113.

ksnip_20230126-051423

aleksasiriski commented 1 year ago

@dims Definitely, I can try to get some.

@aleksasiriski Could you fetch some of the 10 IPs that you had reserved as static IPs because they were blocked by registry.k8s.io when used for nodes?

I had like 3 IPs that were blacklisted, I'll try to fetch them later today (UTC+1) when I'm home.

dims commented 1 year ago

I just deployed a test cluster of 10 nodes, and got "lucky" on one of them. The one affected IP is 5.75.240.113.

[Uploading downloaded-logs-20230126-065347.json.txt…]()

I see 4 hits, all with a valid redirect using http status 307's, no 403's at all

the code it hits is here: https://cs.k8s.io/?q=StatusTemporaryRedirect&i=nope&files=handlers.go&excludeFiles=&repos=kubernetes/registry.k8s.io

mysticaltech commented 1 year ago

@dims Thanks for looking into this. The 403 are most probably appearing closer later down the request chain. As stated by @BenTheElder, it could be your LB security policy (cloud armor) configured here https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

mysticaltech commented 1 year ago

Also @dims, something interesting, discovered by one our users, is that if they tried to pull the image manually with crictl pull a bunch of times, it would actually work at some point. As if magically whitelisting the node again, it works for pulling other images afterward.

Sometimes it works after 100 tries, sometimes it just does not work. So kind of a hit-or-miss situation! All this to say, there's something up with your LB IMHO.

mysticaltech commented 1 year ago

@dims I have created another small test cluster and the IP above 5.75.240.113 has been reused and it does it again. I will leave it on for 24h so that you can have more logs.

pulling from host registry.k8s.io failed with status code [manifests v2.7.0]: 403 Forbidden

ksnip_20230126-135442

mysticaltech commented 1 year ago

Now if I ssh into the node and run the crictl pull command, I get the same:

ksnip_20230126-140027

mysticaltech commented 1 year ago

@dims Also an interesting finding. If I simply issue curl -v https://registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 a few times in a row. It randomly returns either a 404 or a 403.

404 ksnip_20230126-141816

403 ksnip_20230126-141921

dims commented 1 year ago

@mysticaltech yeah, looks like there is very little tolerance for your range of ips from hetzner

mysticaltech commented 1 year ago

Exactly! Which is really a pain when working with Kubernetes. If possible to fix, it would be awesome.

mysticaltech commented 1 year ago

@dims Did you do something? Because it started to work.

ksnip_20230126-165253

dims commented 1 year ago

@mysticaltech nope. theory is still the same - cloud armor!

mysticaltech commented 1 year ago

Oh my! Maybe some kind of form to request whitelisting? That would be kind of good. But not great for autoscaling nodes for instance.

valkenburg-prevue-ch commented 1 year ago

Hi, I'm getting an HTTP 404 code on this request:

{"access_time":"26/Jan/2023:17:04:49 +0000","upstream_cache_status":"","method":"GET","uri":"/artifacts-downloads/namespaces/k8s-artifacts-prod/repositories/images/downloads/ALMFTafKuNkWwH8ArOFD4KogY3p5kp9zcsZSbyhKLzMCEPih3pGxlf8hdweputz3nxUZBrevwToc16OLF7zMqHYUiYRUHvlEfEVSsuu2L5J4uzlOgj_1BY7ZHOHwmRLsHwyaJ8TQE8XlkrCSQSak71-6ZVgvBT9nv57reoR-AE6o4ei_iszTDpPq2xtnFA4tZpIL0tBJor_u8ZoD83KGOGN-aAHsqelMjVqLR5fPp3uluRC1I8coYtFZgafJjEKsqrkeVUdt9hQTHpQ-dGdlbIBOVPWaZCl1IeoDzlHcwrybwcYTB8hyYzJ--mHnaZWfOWs8i2p-dFzdPy68CBTaXgW-gDRymEFDCJe_3b8GhvFMnOOo0ldCZEk4K2fJsnTt_gMC2-4y1zr5k_TrUmcrV_nt8bo4tw4cvYCvb9EJn7GQ3LbkY41avfNbipQmoBkR-rZ9lPhySAVcmiharpD7gJYrqvSxSafP_IBJ3Oxkt0_aUY4A9n4qeqtZRZeSE-BoWdGhiagQVnPWDewkpAMY2M9XfotDZhOUIR_kb8nYWzSi4cjECfltywKzgriY2IT0TS1GoHBLwuJPpGrRFR0afzF-BOQTR8SUnb0b70zprBC8lSc4HkzzW_4MiPBbxPGpa6OXiIZbvjO6ORb-YXGXwCSsee4nkheizN1xTof6z_GHPVJFqhNRqNJSaN8Jfm2Dd0w0C6MBrkTP34K2hKnORXI=","request_type":"unknown","status":"404","bytes_sent":"19","upstream_response_time":"0.000, 0.048","host":"registry.k8s.io","proxy_host":"registry.k8s.io","upstream":"[2600:1901:0:1013::]:443, 34.107.244.51:443"}

from ip 78.47.222.2 (same project and provider as @mysticaltech ) . Does posting ip's and info like this, help? Or is there something else I can provide?

BenTheElder commented 1 year ago

@valkenburg-prevue-ch /artifacts-downloads/.* is not a valid OCI distribution API path. The 404 is because that URL does not exist.


Also @dims, something interesting, https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/451#issuecomment-1348608806,

This actually points to the issue with hetzner IPs existing with plain GCR.

k8s.gcr.io is a special alias domain provided by GCR but it has the same allow-listing etc as any other gcr.io registry.

Kubernetes doesn't run that infra, just populate the images.

Maybe some kind of form to request whitelisting?

I'm not sure how well this would scale given the relatively volunteer staffing we have for this sort of free image host ...

It seems registry.k8s.io has no regression here vs k8s.gcr.io, though I can't recall ever having seen a similar issue reported to Kubernetes previously.

BenTheElder commented 1 year ago

At present time I would recommend mirroring images, which also helps us reduce our massive distribution costs and reallocate resources towards testing etc.

mysticaltech commented 1 year ago

@BenTheElder Thanks for clarifying. But Hetzner Cloud is still a major European cloud, not supporting fully it is a shame IMHO, and for a young open-source project like ours, we don't yet have the resource to deploy a full-blown mirror.

However, if we were to do that, how would you recommend we proceed? This is something we obviously thought about, and have considered already both https://docs.k3s.io/installation/private-registry and https://github.com/goharbor/harbor, would you recommend anything else that is an easy fix for that particular issue?

BenTheElder commented 1 year ago

curl -v https://registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 a few times in a row. It randomly returns either a 404 or a 403.

Again, this is not a valid API path. So the 404s are expected, the request is invalid. 403 are seemingly due to the security mechanism(s).

I recommend crane pull --verbose registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 /dev/null to see what valid request paths look like, or the distribution spec.


Thanks for clarifying. But Hetzner Cloud is still a major European cloud, not supporting fully it is a shame IMHO, and for a young open-source project like ours, we don't yet have the resource to deploy a full-blown mirror.

I hear that, but even as a large open source project we have constrained resources to host things and we're not actively choosing to block these IPs, some security layer on our donated hosting infrastructure is blocking these IPs. At the moment keeping things online and trying to bring our spend back within the budget is a bigger priority than resolving an issue present in the previous infrastructure, and even that is a bit of a stretch. Open source staffing is hard :(

Perhaps you could ask your users to mirror for themselves if they encounter issues like this.

Hetzner might also have thoughts about this issue? It seems in their best interest to avoid what seems to be an IP reputation issue.

Searching online I see similar discussions for Amazon CloudFront and CloudFlare with respect to hetzner IP ban issues.

However, if we were to do that, how would you recommend we proceed? This is something we obviously thought about, and have considered already both https://docs.k3s.io/installation/private-registry and https://github.com/goharbor/harbor, would you recommend anything else that is an easy fix for that particular issue?

Mirroring guides are something I hope to get folks to contribute. Options will depend on the tools involved client-side (like container runtime).

For consuming a mirror, I usually recommend containerd's mirroring config (as dockershim is deprecated), cri-o has something similar I beleive.

For hosting a mirror, I recommend roughly populate images with crane cp upstream mirror, where mirror is any preferred registry host. However there are many other options like harbor, that I've not personally used.

BenTheElder commented 1 year ago

For 5.75.240.113 I see the 404s in the loadbalancer logs, but not 403

Also specifically checking for cloud armor blocking would show under logs for:

jsonPayload.statusDetails="denied_by_security_policy"

Which I can see happening for other requests, but none here ...

valkenburg-prevue-ch commented 1 year ago

Thank you so much for the detailed responses! I think the info you give about 404's points us back at a workaround we tried out earlier (desperately).

And yes, we'll have to fix it on our end one way or the other. I much appreciate the help you're giving.

mysticaltech commented 1 year ago

Thanks for the tips @BenTheElder and keep up the good work!

apricote commented 1 year ago

Hi all,

@hetznercloud employee here. We know of multiple cases where our IPs have been sporadically blocked by Google & AWS. It would be great if you could send any affected IPs and the endpoints where this happens (registry.k8s.io in this case) to our support: https://console.hetzner.cloud/support

BenTheElder commented 1 year ago

Thank you @apricote 🙏

ameukam commented 1 year ago

Closing since Hetzner Cloud Support is involved.

/close

k8s-ci-robot commented 1 year ago

@ameukam: Closing this issue.

In response to [this](https://github.com/kubernetes/registry.k8s.io/issues/138#issuecomment-1423701031): >Closing since Hetzner Cloud Support is involved. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
marblerun commented 1 year ago

Hum,

Perhaps I should send you the initial response from Hetzner Support, as I went there first before trying this most helpful forum ...

Also, to any users of Calico out there, it would appear that at least some of the Hetzner cloud hosts (Finland in this case being the only area I've tried) are susceptible to a networking peculiarity.

This in essence causes dns to fail - fixed by the following patch

kubectl patch felixconfigurations default --type=merge --patch='{"spec":{"featureDetectOverride":"ChecksumOffloadBroken=true"}}'

Might be of help to someone else

thanks

Mike

On Thu, 9 Feb 2023 at 06:26, Kubernetes Prow Robot @.***> wrote:

@ameukam https://github.com/ameukam: Closing this issue.

In response to this https://github.com/kubernetes/registry.k8s.io/issues/138#issuecomment-1423701031 :

Closing since Hetzner Cloud Support is involved.

/close

Instructions for interacting with me using PR comments are available here https://git.k8s.io/community/contributors/guide/pull-requests.md. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue: repository.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/registry.k8s.io/issues/138#issuecomment-1423701082, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ5BQDTJ6IQ6MVWLX4Q7CLWWSE3DANCNFSM6AAAAAATZLEGI4 . You are receiving this because you authored the thread.Message ID: @.***>

-- Mike Shield Infrastructure Consultant 07775-713864

guettli commented 1 year ago

I don't think the issue is resolved.

@apricote (working for Hetzner) said, that this should get reported to hetzner. That's why I did.

I got this reply:

Unfortunately, some of our IPs are wrongly located in Iran by some GeoIP databases.

If this should lead to impairments like you mentioned, please create a snapshot and create a new server with a new IP with this snapshot. You can then delete the "faulty" server.

Unfortunately we cannot influence these databases. Thank you for your understanding.

I checked some web based GeoIP services, and all agree: my IP (116.203.200.129) is located in Germany.

I don't know the details of the contract between the CNCF and the service provider (Google).

But I think the CNCF should take care that the service works as expected.

Related thread: https://kubernetes.slack.com/archives/CCK68P2Q2/p1685607650986089

dims commented 1 year ago

But I think the CNCF should take care that the service works as expected.

Sorry. No.

guettli commented 1 year ago

But I think the CNCF should take care that the service works as expected.

Sorry. No.

@dims what's your idea? How could this (Random download failures) get solved?

dims commented 1 year ago

@guettli https://github.com/kubernetes/registry.k8s.io/tree/main/docs/mirroring

mysticaltech commented 1 year ago

@guettli https://github.com/kubernetes/registry.k8s.io/tree/main/docs/mirroring

@valkenburg-prevue-ch FYI, interesting guide.

guettli commented 5 months ago

Today, on this machine ipv4 was blocked, and ipv6 worked:

❯ curl -6 -L https://registry.k8s.io/v2/pause/manifests/3.7 > /dev/null 
❯ curl -4 -L https://registry.k8s.io/v2/pause/manifests/3.7
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 403 (Forbidden)!!1</title>
❯ nslookup registry.k8s.io
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   registry.k8s.io
Address: 34.96.108.209
Name:   registry.k8s.io
Address: 2600:1901:0:bbc4::
❯ curl -4 ipecho.net/plain
95.217.9.112

❯ curl -6 ipecho.net/plain
2a01:4f9:c011:8866::1
rbjorklin commented 5 months ago

I'm finding myself in a similar predicament. I'm seeing ImagePullBackOff from nodes with these IPs:

Mirroring is a fine solution assuming the nodes I'm trying to mirror via can reach registry.k8s.io in the first place.

Here's to hoping someone at Google reads this and unblocks the Hetzner IP ranges associated with AS213230.

BenTheElder commented 5 months ago

I'm finding myself in a similar predicament. I'm seeing ImagePullBackOff from nodes with these IPs

Sorry, unfortunately we cannot do more here. Please see the note at the top of: https://github.com/kubernetes/registry.k8s.io/blob/main/docs/debugging.md#debugging-issues-with-registryk8sio

Mirroring is a fine solution assuming the nodes I'm trying to mirror via can reach registry.k8s.io in the first place.

The intent is that you populate the mirror from elsewhere (e.g. even your local development machine could push to a mirror) for more reliable consumption from your hosts / users.

Again: https://registry.k8s.io#stability

These images are being hosted for free download at great expense and run by ~volunteers from multiple companies / independent.

Here's to hoping someone at Google reads this and unblocks the Hetzner IP ranges associated with AS213230.

There are people from Google working on this project 👋 but unfortunately I cannot publicly discuss the specifics of GCP's restrictions.

I will point out however that what is happening is not new to registry.k8s.io and applied to k8s.gcr.io and prior hosts (which were 100% funded by Google).

If someone wants to come work with SIG K8s Infra on an alternate implementation with sponsorship from other vendors, there are details about how to contact and participate in the README.

Alternatively if someone wanted to investigate hosting a mirror for Hetzner, that would be great, feel free to reach out. https://github.com/kubernetes/registry.k8s.io#community-discussion-contribution-and-support

mysticaltech commented 5 months ago

@apricote FYI the above. Running a registry.k8s.io mirror for Hetzner would be great.

rbjorklin commented 4 months ago

For anyone still facing this problem I have been able to work around it by deploying peerd in my cluster.

vitobotta commented 4 months ago

For anyone still facing this problem I have been able to work around it by deploying peerd in my cluster.

Hi! I installed peerd in k3s (had to build the image with a changed containerd socket path and it's now running) but how to use it? Thanks

rbjorklin commented 4 months ago

@vitobotta

[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/etc/containerd/certs.d"

After that images will automatically be pulled from other nodes in your cluster if they are present.

vitobotta commented 4 months ago

@vitobotta

  • Ensure /etc/containerd/config.toml contains:
[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/etc/containerd/certs.d"

After that images will automatically be pulled from other nodes in your cluster if they are present.

Thanks :) In the meantime I ended up using https://github.com/spegel-org/spegel since it doesn't require me to open any ports in the firewall. Peerd does if I am not mistaken, right?