pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
81.76k stars 21.93k forks source link

Responses from `https://download.pytorch.org/whl/cpu` have `cache-control: no-cache` set in their headers #130571

Open aabtop opened 1 month ago

aabtop commented 1 month ago

🐛 Describe the bug

Not sure if this is intentional, but it makes it difficult for the responses to play well with caching, e.g. this prompted the https://github.com/astral-sh/uv/issues/4967 where it's unclear how to resolve.

FWIW,

% curl -I https://download.pytorch.org/whl/cpu | grep cache-control
cache-control: no-cache,no-store,must-revalidate

and

% curl -I https://pypi.org/simple/ | grep cache-control
cache-control: max-age=600, public

In this particular instance this is manifesting as a problem for me because requests to https://download.pytorch.org/whl/cpu sometimes sporadically fail (e.g. with 503), so being able to reliably cache it would mitigate this considerably.

Versions

Fairly certain this issue is independent of my runtime and reproducible anywhere.

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman

BurntSushi commented 1 month ago

As the person who implemented uv's HTTP caching semantics, I think ideally no-cache and no-store would be removed here, but must-revalidate is okay. And of course, copying what PyPI does would be fine too.

atalman commented 1 month ago

These settings are currently set per object for our s3 files using Metadata section: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html?icmpid=docs_amazons3_console#object-metadata.

These settings apply only to index.html files, not whl or tgz files. Index.html files are updated constantly by this job: https://github.com/pytorch/test-infra/actions/workflows/update-s3-html.yml. I think setting cache-control: max-age=600, public is a good idea we should probably go for it. @aabtop Do you have any examples of 503 failures that you mentioned in the issue description ?

albanD commented 1 month ago

Lowering priority because it's only for the index. Still open question how that matches to PyPi.

malfet commented 1 month ago

I think it's reasonable to cache index for 10 min or so

top-oai commented 1 month ago

@aabtop Do you have any examples of 503 failures that you mentioned in the issue description ?

Sorry for the delay. I'm going back through the links I recorded and actually am having a hard time finding 503 failures, but it's hard to reproduce especially since some auto-retries we do now make it so that this doesn't get flagged anymore. I remember seeing one of them and got it in my head that that was the most common and so put that in the "(e.g. with 503)" statement, but it looks like 500 errors were more common for me. Seems like this is getting into separate issue (from this caching issue) territory, but here's an example:

error: HTTP status server error (500 Internal Server Error) for url (https://download.pytorch.org/whl/cpu/poetry-core/)

which is actually a bit strange since I don't see poetry-core in the index. In any case, I think it might be fine to leave that 500 alone for now since it's a bit funny, and I'll make a separate issue for it if I can isolate it better.