Closed xmudrii closed 5 months ago
Update: it turns out that cache miss downloads are slow, and cache hit downloads are fast. This can be be determined from x-cache: MISS
and x-cache: HIT
headers. Once the file is cached on Fastly side, downloads are fast, but prior to that, downloads are insanely slow.
It might be related, but the CDN is not just slow, it's inconsistent. 1.29-alpha.1 was released yesterday, but depending from where you perform a curl -L https://dl.k8s.io/release/latest-1.29.txt
, you receive either alpha.0 or alpha.1
This will even change on the same computer if you just re-run the same curl command a few seconds later. Not sure if individual CDN servers "downgrade" their data or if I'm just hitting tons of random CDN nodes that all have an inconsistent state, but it's weird and sadly unreliable :/
These two request happened basically at the same time:
< HTTP/2 200
< x-guploader-uploadid: ADPycdutDBgx7kyHbX7GUaTmNyxVRNVE82erWSx3_jmUaV5c01OeI7dkYmcu9pfg9gj5BTsgpYgYhWRUMYxkNtP4PVKi26f6HtKM
< expires: Sun, 24 Sep 2023 12:42:09 GMT
< last-modified: Wed, 26 Jul 2023 09:06:19 GMT
< etag: "9b59bd47d18f2395481cf230a43a56e0"
< content-type: text/plain
< cache-control: private, no-store
< accept-ranges: bytes
< date: Tue, 26 Sep 2023 10:40:55 GMT
< via: 1.1 varnish
< age: 165525
< x-served-by: cache-fra-etou8220117-FRA
< x-cache: HIT
< x-cache-hits: 1
< access-control-allow-origin: *
< content-length: 15
<
* Connection #1 to host cdn.dl.k8s.io left intact
v1.29.0-alpha.0
and
< HTTP/2 200
< x-guploader-uploadid: ADPycds7gWeT690zb-SSaamOrnGHAi6AgaV_K0SWCSe5XMLoJ1zFIE0NiJNe0v8Nr0STrfLXh5GwEv5JBgB6RhU6cqOdVHcHyJIy
< expires: Tue, 26 Sep 2023 07:08:47 GMT
< last-modified: Mon, 25 Sep 2023 20:56:50 GMT
< etag: "7d852bf327f00c76b50173de7dbaebf6"
< content-type: text/plain
< cache-control: private, no-store
< accept-ranges: bytes
< date: Tue, 26 Sep 2023 10:40:50 GMT
< via: 1.1 varnish
< age: 12723
< x-served-by: cache-muc13944-MUC
< x-cache: HIT
< x-cache-hits: 1
< access-control-allow-origin: *
< content-length: 15
<
* Connection #1 to host cdn.dl.k8s.io left intact
v1.29.0-alpha.1
Both claim a cache hit, but return different results.
This can lead to serious issues. It looks like you're getting served from FRA and MUC, and these nodes might indeed have different cache. I think we should ignore version markers from cache, these can get changed often, especially latest
ones.
Yeah. We are not specific about file extensions for the cache configuration.
I'll open a PR to fix it this week. Another option could be to directly serve those version makers through the nginx instance instance of the CDN provider.
@xrstf can you open an new issue with what you described ? To better track what's happening. Thanks!
Can do, done => #5900.
We increased the TTL for the different objects in https://github.com/kubernetes/k8s.io/pull/5871. Hopefully the situation should be better.
The current CDN is a "pull-through" cache so a MISS
is expected for any object at the POP close the client for the first request. Our real issue the number of the objects that need to be cached at edge. We have a lot of objects (in this case binaries) rarely pulled. I don't think there is an efficient mechanism to warm all the POP of the CDN provider for all the objects we currently host but I open to any suggestions.
Note that our cache is currently over 99% now. I don't think we can do more that.
IIRC a mid-level cache was mentioned talking to fastly previously?
IIRC a mid-level cache was mentioned talking to fastly previously?
maybe you're talking about Origin Shield ? If that the case, the feature is mostly efficient with regional buckets which is not the case for gs://kubernetes-release
. I'll ask about the exact requirements for this feature.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@xmudrii is the problem still happening ?
@ameukam I'll check and get back to you
@ameukam This is still the issue for non-cached artifacts downloaded over dl.k8s.io
, see the screenshot:
/remove-lifecycle stale
Non-cached artifacts going through Fasly will always be slow for the first request on the POP close the requester. Fastly don't replicate all the objects over it's entire network. Objects are cached based on the requests. If the object is not present at Fastly Edge, it will always be slower than the origin.
@ameukam Is there anything that we can do to make it at least a little faster? The difference is huge, it takes 5 seconds when downloading directly from the bucket, but about 1 minutes and 30 seconds when downloading from the CDN. Subsequent requests might be slow as well because there's a chance to get you redirected to some other edge location.
One possibility could be Fastly Origin Shield but we need to switch to the origin to a regional bucket.
Even cached requests are much slower for me. Something that takes 3-5 seconds when downloaded from the bucket directly takes 30-40 seconds when downloaded via CDN. I double-checked with @xrstf and he sees okay speeds on 2nd and 3rd try (the 1st try is also slow for him), but that's not the case for me.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think this has been mostly fixed, I didn't observe it for a while, closing the issue for now /close
@xmudrii: Closing this issue.
I've observed that downloads using
curl
going overcdn.dl.k8s.io
(dl.k8s.io
) are much slower than direct downloads from the bucket (storage.googleapis.com/kubernetes-release
).For example, downloading kubelet v1.28.1 directly from the bucket yields the following results:
The download took 4 seconds in total. However, downloading via the CDN yields much different results:
It took one minute and five seconds to download the same file.
Update: it turns out that cache miss downloads are slow, and cache hit downloads are fast. This can be be determined from
x-cache: MISS
andx-cache: HIT
headers. Once the file is cached on Fastly side, downloads are fast, but prior to that, downloads are insanely slow./sig k8s-infra /priority important-soon /kind bug cc @ameukam @BenTheElder