Closed bewinsnw closed 6 days ago
I see what's going on here now. simple-repository is sending accept headers allowing gzip compression; so the content-length header it gets back is the length of the gzipped body. But then when it streams the response, it's streaming the uncompressed response, which trips up uvicorn.
I dumped the headers inside http_response_iterator.py:
<CIMultiDictProxy('Connection': 'keep-alive', 'Content-Length': '1298', 'Server': 'nginx', 'Content-Type': 'application/octet-stream', 'Last-Modified': 'Mon, 06 May 2024 20:49:10 GMT', 'Etag': '"71912d8b4ad8713b7a44242dd311c57a"', 'x-amz-request-id': '6d382d8e8035375b', 'x-amz-id-2': 'aN/djTDE5NtxmMzHtMHBkZWZjYwMwTzgz', 'x-amz-version-id': '4_z179c51e67f11a0ad8f6c0018_f117117514458507d_d20240506_m204910_c005_v0501020_t0003_u01715028550339', 'Content-Encoding': 'gzip', 'Cache-Control': 'max-age=365000000, immutable, public', 'Accept-Ranges': 'bytes', 'Date': 'Sat, 25 May 2024 10:21:17 GMT', 'Age': '763696', 'X-Served-By': 'cache-iad-kcgs7200170-IAD, cache-lcy-eglc8600066-LCY', 'X-Cache': 'HIT, HIT', 'X-Cache-Hits': '85, 1', 'X-Timer': 'S1716632477.124754,VS0,VE1', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'X-Frame-Options': 'deny', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Robots-Header': 'noindex', 'Access-Control-Allow-Methods': 'GET, OPTIONS', 'Access-Control-Allow-Headers': 'Range', 'Access-Control-Allow-Origin': '*', 'x-pypi-file-python-version': 'py3', 'x-pypi-file-version': '24.1b1', 'x-pypi-file-package-type': 'bdist_wheel', 'x-pypi-file-project': 'pip')>
either we need to stream the raw response back from the server, or the 'Content-Length': '1298', ...'Content-Encoding': 'gzip',
headers need dropped (along with accept-ranges, since simple-repository-server doesn't).
aiohttp.ClientSession can be called with auto_decompress=False, but this will cause the code elsewhere to fail: in fetch_simple_page it tries to treat a PEP-503 body as text, which it won't be if compression is on. Proxying the compressed response still compressed is also wrong when the client didn't request compression.
So, it's better to not proxy the headers except for a handful: Content-Type, Last-Modified, Etag, Cache-Control, Date, Age, Vary seem like a reasonable set - and rely on uvicorn to chunk the response. So in HttpResponseIterator I changed:
iterator.status_code, iterator.headers = resp.status, resp.headers
# The first time that anext is called, set stauts_code and
to
iterator.status_code = resp.status
proxy_headers = ["content-type", "last-modified", "etag", "cache-control", "date", "age", "vary"]
iterator.headers = {k: v for k,v in resp.headers.items() if k.lower() in proxy_headers}
# The first time that anext is called, set status_code and
The response headers from uvicorn were now:
< HTTP/1.1 200 OK
< date: Sat, 25 May 2024 10:55:24 GMT
< server: uvicorn
< content-type: application/octet-stream
< last-modified: Mon, 06 May 2024 20:49:10 GMT
< etag: "71912d8b4ad8713b7a44242dd311c57a"
< cache-control: max-age=365000000, immutable, public
< date: Sat, 25 May 2024 10:55:25 GMT
< age: 765745
< vary: Accept-Encoding
< transfer-encoding: chunked
and the metadata downloaded.
Thanks for the clear and reproducible example. This didn't get resolved with the move to httpx
, and indeed the auto decompression option appears not to exist with httpx (https://github.com/encode/httpx/discussions/2220#discussion-4063893).
The ideal approach is that we pass the original request headers through to the proxied request, and then don't tamper with the results. In this way, we will also support range requests correctly.
Should now be resolved in v0.6.0. It is resolved by passing the request headers down to the child request. I had to hack httpx to avoid it decoding the response in the stream (now thoroughly tested). Please let me know how it works out for you with this release! Closing for now, but don't hesitate to re-open if not fully resolved.
I'm running in docker and just proxying to pypi:
docker build -t simple . && docker run -it --rm -p 9191:8000 simple https://pypi.org/simple/
But pip errors out fetching metadata. Doing this by hand with curl you can see the metadata body does not stream:
Here's the log from the container
Curling the upstream directly (
curl -v https://files.pythonhosted.org/packages/1e/65/22725f8ba583376d0c300c3b9b52b9a67cfd93d786a80be73c167e45abc8/pip-24.1b1-py3-none-any.whl.metadata
) works just fineI also looked at the size of the content returned from upstream vs what's in their content-length header, it seems fine. Not sure what it's complaining about, I'll dig in further later