kubernetes / registry.k8s.io

This project is the repo for registry.k8s.io, the production OCI registry service for Kubernetes' container image artifacts
https://registry.k8s.io
Apache License 2.0
388 stars 68 forks source link

403 error when pulling images from Swiss cloud provider #174

Closed alexandrevoilab closed 1 year ago

alexandrevoilab commented 1 year ago

For the past few weeks, we are unable to pull any images from registry.k8s.io or k8s.gcr.io. The problem started to occur randomly in december. Images would pull some days, and fail some other days. Now, for the past two to three weeks, we can't pull any images form both registres.

Sadly, I have absolutly no idea who I should reach about that. So I'm trying my luck here.

Pull are made from AS29222, mainly 195.15.243.0/24.

Here are crane logs trying to pull metrics-server:v0.6.2:

ubuntu@k8s-worker-2:~$ ./crane pull --verbose k8s.gcr.io/metrics-server/metrics-server:v0.6.2 /dev/null
2023/03/13 17:43:44 --> GET https://k8s.gcr.io/v2/
2023/03/13 17:43:44 GET /v2/ HTTP/1.1
Host: k8s.gcr.io
User-Agent: crane/0.13.0 go-containerregistry/0.13.0
Accept-Encoding: gzip

2023/03/13 17:43:44 <-- 403 https://k8s.gcr.io/v2/ (167.668712ms)
2023/03/13 17:43:44 HTTP/2.0 403 Forbidden
Content-Length: 1582
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Content-Type: text/html; charset=UTF-8
Date: Mon, 13 Mar 2023 17:43:44 GMT
Referrer-Policy: no-referrer

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 403 (Forbidden)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>403.</b> <ins>That’s an error.</ins>
  <p>Your client does not have permission to get URL <code>/v2/</code> from this server.  <ins>That’s all we know.</ins>

Error: GET https://k8s.gcr.io/v2/: unexpected status code 403 Forbidden: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 403 (Forbidden)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>403.</b> <ins>That’s an error.</ins>
  <p>Your client does not have permission to get URL <code>/v2/</code> from this server.  <ins>That’s all we know.</ins>

And trying registry.k8s.io:

ubuntu@k8s-worker-2:~$ ./crane pull --verbose registry.k8s.io/metrics-server/metrics-server:v0.6.2 /dev/null
2023/03/13 17:45:52 --> GET https://registry.k8s.io/v2/
2023/03/13 17:45:52 GET /v2/ HTTP/1.1
Host: registry.k8s.io
User-Agent: crane/0.13.0 go-containerregistry/0.13.0
Accept-Encoding: gzip

2023/03/13 17:45:53 <-- 403 https://registry.k8s.io/v2/ (185.958816ms)
2023/03/13 17:45:53 HTTP/2.0 403 Forbidden
Content-Length: 298
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/</code> from this server.</h2>
<h2></h2>
</body></html>

Error: GET https://registry.k8s.io/v2/: unexpected status code 403 Forbidden:
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/</code> from this server.</h2>
<h2></h2>
</body></html>

The same pull works from other Swiss providers.

If this is not the right place, sorry for the noise and feel free to close this. But please, at least point me to somewhere I can get help with that.

Thanks!

ameukam commented 1 year ago

cc @BenTheElder

BenTheElder commented 1 year ago

For the past few weeks, we are unable to pull any images from registry.k8s.io or k8s.gcr.io.

If this is also affecting k8s.gcr.io, this is somewhat out of scope for us, and sounds like https://github.com/kubernetes/registry.k8s.io/issues/138, most likely you've hit some blocking in GCP.

k8s.gcr.io has the same policies in place as any other GCR registry.

registry.k8s.io is based in part on GCLB with App Armor and also has the same protections.

cc @BenTheElder

I think you'll have to contact support about being blocked, if you suspect it's related to the cloud provider then I'd recommend that you ask your cloud provider to resolve this (as in #138).

I do work for GCP but fielding these personally is not terribly scalable. Primarily I work on Kubernetes and at the moment my focus is on broadly keeping the project sustainable.

For an immediate solution I'd recommend mirroring images or a pull-through-cache which helps both Kubernetes sustainability and improves your control over access / uptime.

BenTheElder commented 1 year ago

I looked through the logs (nothing GCP internal, the logs Kubernetes project has) for registry.k8s.io.

I see a number of 404 requests from 195.15.243.0/24 to https://registry.k8s.io/v1/_ping which is expected, as only v2 is supported, requesting this path appears to be a cri-o behavior probing for API support (though I'm not sure why it wouldn't just request v2 first as essentially all current registries are v2...?)

I don't see any other errors serve from that IP range, or any Cloud Armor rule enforcement.

So this definitely looks like #138.

alexandrevoilab commented 1 year ago

Thanks for the fast reply.

I've filled a ticket with our cloud provider.

As for the proxy, I'll do that. Which obviously require me to use another provider.

alexandrevoilab commented 1 year ago

I somehow missed your last comment.

We do use cri-o.

You mention that GETs to https://registry.k8s.io/v1/_ping seems to get through. Meaning that it should not be a Cloud Armor rule.

Now, like before, this fail.

ubuntu@k8s-worker-2:~$ ./crane pull --verbose registry.k8s.io/metrics-server/metrics-server:v0.6.2 /dev/null
2023/03/14 08:29:22 --> GET https://registry.k8s.io/v2/
2023/03/14 08:29:22 GET /v2/ HTTP/1.1
Host: registry.k8s.io
User-Agent: crane/0.13.0 go-containerregistry/0.13.0
Accept-Encoding: gzip

2023/03/14 08:29:22 <-- 403 https://registry.k8s.io/v2/ (139.135892ms)
2023/03/14 08:29:22 HTTP/2.0 403 Forbidden
Content-Length: 298
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Content-Type: text/html; charset=UTF-8
Referrer-Policy: no-referrer

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/</code> from this server.</h2>
<h2></h2>
</body></html>

Error: GET https://registry.k8s.io/v2/: unexpected status code 403 Forbidden:
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/</code> from this server.</h2>
<h2></h2>
</body></html>

But very oddly, using curl I get a 200.

ubuntu@k8s-worker-2:~$ curl -v https://registry.k8s.io/v2/
*   Trying 34.107.244.51:443...
* Connected to registry.k8s.io (34.107.244.51) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=registry.k8s.io
*  start date: Feb 28 23:40:14 2023 GMT
*  expire date: May 30 00:33:07 2023 GMT
*  subjectAltName: host "registry.k8s.io" matched cert's "registry.k8s.io"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x55ea98807560)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /v2/ HTTP/2
> Host: registry.k8s.io
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 200
< docker-distribution-api-version: registry/2.0
< x-cloud-trace-context: a533e79162da90b3cb01f93c4dde1941
< date: Tue, 14 Mar 2023 08:29:57 GMT
< content-type: text/html
< server: Google Frontend
< content-length: 0
< via: 1.1 google
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Connection #0 to host registry.k8s.io left intact

So, for my understanding, you says that it is not Cloud Armor, but #138 says it is. The problem is that the subnet is somehow blacklisted, right?

BenTheElder commented 1 year ago

You mention that GETs to https://registry.k8s.io/v1/_ping seems to get through. Meaning that it should not be a Cloud Armor rule.

Er yes, the only errors I see in the logs on Kubernetes' side for this IP range are the 404 errors from cri-o attempting /v1/_ping alongside /v2/, it is expected that /v1/ would error and the error is the registry application replying 404, not an armor rule.

there are no instances of armor rules being applied

So that leaves the IP being blocked by GCP, which is what happened in #138 with hetzner.

As for the proxy, I'll do that. Which obviously require me to use another provider.

Well or one of the unaffected IPs (since I can see some IPs in this range getting through fine), or mirroring images / copying them to your own registry instead

I've filled a ticket with our cloud provider.

thanks! Hopefully they can follow up with GCP to sort it out similar to hetzner.

So, for my understanding, you says that it is not Cloud Armor, but https://github.com/kubernetes/registry.k8s.io/issues/138 says it is. The problem is that the subnet is somehow blacklisted, right?

138 was largely not cloud armor. While investigating it we did find some overzealous cloud armor rules and we've removed them.

But even before cloud armor traffic can be blocked before reaching our rules / application because e.g. it is flagged as associated with an embargoed region and blocked by US export law. Unfortunately there's not a lot of public detail about that sort of thing but your provider should be able to discuss with our provider(s).

This can happen with our AWS backends as well and there's not much we can directly do about it: https://github.com/wakatime/vscode-wakatime/issues/185, https://gitlab.com/gitlab-com/migration/-/issues/649#note_90812768

As far as I can tell we're not serving 403s to this IP range ourselves including with our Armor Rules on the load balancer. Our armor rules do serve 403s when activated.

ameukam commented 1 year ago

Closing. Issue should be handled by the cloud provider.

/close

k8s-ci-robot commented 1 year ago

@ameukam: Closing this issue.

In response to [this](https://github.com/kubernetes/registry.k8s.io/issues/174#issuecomment-1473369235): >Closing. Issue should be handled by the cloud provider. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
thomasgoirand commented 1 year ago

Hi @ameukam . How can (we) the cloud provider handle the issue then? I've searched how to get de-listed from "cloud armor" but didn't find any ways to have this happen.

dims commented 1 year ago

@thomasgoirand long time no see. hope you are well.

This seems to be an active forum - https://issuetracker.google.com/issues?q=status:open%20componentid:1132263&s=created_time:desc

thomasgoirand commented 1 year ago

Hi Dims!

Indeed, it's been a long time. I'd love to discuss with you again.

I started a new thread here: https://issuetracker.google.com/issues/273978804

I'm not sure where this is going to lead us...

Cheers,

Thomas

BenTheElder commented 1 year ago

It's not cloud armor. Cloud armor would be something Kubernetes can change.

This is something else at the cloud level (e.g. IPs blocked by embargo), which also explains why it applies to GCR and not just our application. Unfortunately I'm just a developer on Kubernetes and I can't share more specifics regarding that sort of thing, but you should be able to get in private contact with GCP to resolve as Hetzner did. 🤞

BenTheElder commented 1 year ago

Cloud Armor is a WAF we (Kubernetes) employ on the main loadbalancer, those configs are public terraform in another repo.

When Cloud Armor rules block something we get a log message accessible to a subset of trusted project maintainers here that includes the rule applied and other info.

When I looked that was not happening here. We did have overly aggressive rules in the past and found and fixed those while looking into the hetzner issues. In both cases (cloud armor, GCP blocking) a 403 is served but not the same way.

However the Hetzner issues were also not actually Cloud Armor either, and also applied to GCR.

thomasgoirand commented 1 year ago

@BenTheElder Who should I contact at GCP then?!?

BenTheElder commented 1 year ago

You should still contact GCP, if it were Cloud Armor it would be on us here to fix it.

BenTheElder commented 1 year ago

Maybe @apricote can send you a note about their experience? Some of this may be NDA.

Sorry. This is way beyond my scope, and I'm also trying to make sure our plans to be able to run all of this cost effectively smoothly in the next few weeks (https://kubernetes.io/blog/2023/03/10/image-registry-redirect/) so I'm a bit low on bandwidth at the moment.

ameukam commented 1 year ago

@BenTheElder Who should I contact at GCP then?!?

One possibility would be send an email to the Google Security Team. I'm aware of security@google.com but I don't know if there are other ways to reach them.