ido50 / morgan

PyPI Mirror for Restricted/Offline Environments
Apache License 2.0
105 stars 7 forks source link

Decode request URLs before handling #4

Closed i077 closed 2 years ago

i077 commented 2 years ago

Variants of some packages (e.g. CPU-only builds of PyTorch) may have characters in their filenames that get percent-encoded. In the given example, + gets encoded to %2B. When pip asks for these filenames, (at least on my machine) it uses percent-encoded URLs in its requests to the morgan server. The server however does not decode these URLs, so it just returns a 404.

Pass the URL to urllib.parse.unquote before urllib.parse.unsplit to fix this. I'm not sure this is the best way to fix it, but it was the easiest. It might make more sense to instead unquote just the path portion of the URL after urlsplit instead of the entire thing.

Note: I did not use morgan itself to retrieve the packages that had those special characters since they were outside of PyPI, and I haven't really looked for an applicable example inside PyPI. I think the idea still applies though, since AFAIK pip will quote any URLs it sends.

ido50 commented 2 years ago

Looks good to me, tested with Python 3.7 and 3.10. Merged manually and released in v0.12.3. Thanks a lot!