Closed ryanking13 closed 1 year ago
For now, this is blocked by: https://github.com/pypi/warehouse/issues/12214: some stale responses does not contain CORS headers added in https://github.com/pypi/warehouse/pull/13222.
Thanks a lot for looking into this @ryanking13 !
Use devpi or pypi/warehouse to host wheels for test
See also the discussion in https://github.com/pyodide/pyodide/issues/3049 though I haven't really made much progress on it. Aside from determining that warehouse is not suitable for this purpose according to its devs, as it's the code for PyPI and too complex to use for other applications.
some stale responses does not contain CORS headers added in Change micropip's default endpoint to pypi.org/simple
Would it be too much work to support both? I mean this would be great to support other hosting solutions that could use the Simple API even independently from PyPI. For instance,
to name a new.
Would it be too much work to support both?
Do you mean supporting both JSON API and Simple API? I think it is not that hard, but I think most private hosting solutions would use Simple API (except for pypiserver), so I was thinking that it should be okay to support only simple API which is now a standard.
I think it is not that hard, but I think most private hosting solutions would use Simple API (except for pypiserver), so I was thinking that it should be okay to support only simple API which is now a standard.
If that outdated cache issue has a workaround for PyPI sure we can only keep Simple API.
But so are we talking about HTML Simple API or Json Simple API? It's pretty horrible to have to parse HTML files to extract links to then parse other HTML files and parse more links. I mean maybe for native installers it doesn't matter, but on a web page every bit of overhead matters when loading the page. So that's why if we can avoid HTML Simple API being the default it would probably be better. Although I do understand that if third-party services use it, we have to support it.
If that outdated cache issue has a workaround for PyPI sure we can only keep Simple API.
Right, if the cache issue is not resolved for a long time, we may need to provide a JSON API as a fallback... let me see how hard it would be to support both APIs.
But so are we talking about HTML Simple API or Json Simple API?
Both. I found that there already exists a good parser (https://github.com/brettcannon/mousebender) that can parse both HTML and JSON API. We can add a Accept: application/vnd.pypi.simple.v1+json
header that tells server that we prefer JSON response, but it is possible to handle HTML response as well.
It seems like PyPI JSON-based Simple API (PEP 691) now contains CORS headers correctly, while HTML-based Simple API (PEP 503) still doesn't. Probably PyPI purged all cached JSON responses recently due to PEP 658.
test script:
import requests
import time
import random
top_pypi_packages = "https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.min.json"
packages = requests.get(top_pypi_packages).json()
rows = packages["rows"]
for idx, package in enumerate(random.choices(rows, k=100)):
name = package["project"]
# PEP 691
resp = requests.get(f"https://pypi.org/simple/{name}/", headers={"Accept": "application/vnd.pypi.simple.v1+json"})
headers = resp.headers
assert headers["Content-Type"] == "application/vnd.pypi.simple.v1+json"
assert resp.ok
if headers.get("Access-Control-Allow-Origin") != "*":
print(f"({idx}) Fail (json): {name}")
# PEP 503
resp = requests.get(f"https://pypi.org/simple/{name}/", headers={"Accept": "text/html"})
headers = resp.headers
assert headers["Content-Type"] == "text/html"
assert resp.ok
if headers.get("Access-Control-Allow-Origin") != "*":
print(f"({idx}) Fail (html): {name}")
time.sleep(1)
Result:
(1) Fail (html): backoff
(2) Fail (html): better-exceptions
(3) Fail (html): pypdf2
(9) Fail (html): flask-swagger-ui
(23) Fail (html): sqlalchemy-mate
(25) Fail (html): scipy
(30) Fail (html): httpcore
(36) Fail (html): ngram
(37) Fail (html): ordered-set
(39) Fail (html): azure-mgmt-billing
(41) Fail (html): awscli-local
(43) Fail (html): azure-mgmt-datalake-analytics
(60) Fail (html): pamela
(65) Fail (html): ipympl
(78) Fail (html): pytzdata
(79) Fail (html): typish
(81) Fail (html): django-ckeditor
(85) Fail (html): pebble
(87) Fail (html): azure-common
(89) Fail (html): scandir
(91) Fail (html): oscrypto
(92) Fail (html): pydocstyle
(94) Fail (html): azure-mgmt-redhatopenshift
(95) Fail (html): cligj
(96) Fail (html): spacy-loggers
Which is a good news and I think I can continue on #65, as we will avoid using HTML APIs by default (though we will need to support and test it locally).
Closing as completed, I'll open a separate issue for pep658.
I've recently started working on changing micropip to use the simple API (PEP 503, PEP 691), instead of legacy JSON API. I think changing to simple API will help people to create alternative package registries following the standard, and we can also benefit from other PyPA tools.