pyodide / micropip

A lightweight Python package installer for Pyodide
https://micropip.pyodide.org
Mozilla Public License 2.0
76 stars 21 forks source link

Plans for using Simple Repository API #62

Closed ryanking13 closed 1 year ago

ryanking13 commented 1 year ago

I've recently started working on changing micropip to use the simple API (PEP 503, PEP 691), instead of legacy JSON API. I think changing to simple API will help people to create alternative package registries following the standard, and we can also benefit from other PyPA tools.

ryanking13 commented 1 year ago

For now, this is blocked by: https://github.com/pypi/warehouse/issues/12214: some stale responses does not contain CORS headers added in https://github.com/pypi/warehouse/pull/13222.

rth commented 1 year ago

Thanks a lot for looking into this @ryanking13 !

Use devpi or pypi/warehouse to host wheels for test

See also the discussion in https://github.com/pyodide/pyodide/issues/3049 though I haven't really made much progress on it. Aside from determining that warehouse is not suitable for this purpose according to its devs, as it's the code for PyPI and too complex to use for other applications.

rth commented 1 year ago

some stale responses does not contain CORS headers added in Change micropip's default endpoint to pypi.org/simple

Would it be too much work to support both? I mean this would be great to support other hosting solutions that could use the Simple API even independently from PyPI. For instance,

to name a new.

ryanking13 commented 1 year ago

Would it be too much work to support both?

Do you mean supporting both JSON API and Simple API? I think it is not that hard, but I think most private hosting solutions would use Simple API (except for pypiserver), so I was thinking that it should be okay to support only simple API which is now a standard.

rth commented 1 year ago

I think it is not that hard, but I think most private hosting solutions would use Simple API (except for pypiserver), so I was thinking that it should be okay to support only simple API which is now a standard.

If that outdated cache issue has a workaround for PyPI sure we can only keep Simple API.

But so are we talking about HTML Simple API or Json Simple API? It's pretty horrible to have to parse HTML files to extract links to then parse other HTML files and parse more links. I mean maybe for native installers it doesn't matter, but on a web page every bit of overhead matters when loading the page. So that's why if we can avoid HTML Simple API being the default it would probably be better. Although I do understand that if third-party services use it, we have to support it.

ryanking13 commented 1 year ago

If that outdated cache issue has a workaround for PyPI sure we can only keep Simple API.

Right, if the cache issue is not resolved for a long time, we may need to provide a JSON API as a fallback... let me see how hard it would be to support both APIs.

But so are we talking about HTML Simple API or Json Simple API?

Both. I found that there already exists a good parser (https://github.com/brettcannon/mousebender) that can parse both HTML and JSON API. We can add a Accept: application/vnd.pypi.simple.v1+json header that tells server that we prefer JSON response, but it is possible to handle HTML response as well.

ryanking13 commented 1 year ago

https://discuss.python.org/t/pep-658-is-now-live-on-pypi/26693

PEP 658 yay :)

ryanking13 commented 1 year ago

It seems like PyPI JSON-based Simple API (PEP 691) now contains CORS headers correctly, while HTML-based Simple API (PEP 503) still doesn't. Probably PyPI purged all cached JSON responses recently due to PEP 658.

test script:

import requests
import time
import random

top_pypi_packages = "https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.min.json"

packages = requests.get(top_pypi_packages).json()
rows = packages["rows"]
for idx, package in enumerate(random.choices(rows, k=100)):
    name = package["project"]

    # PEP 691
    resp = requests.get(f"https://pypi.org/simple/{name}/", headers={"Accept": "application/vnd.pypi.simple.v1+json"})
    headers = resp.headers

    assert headers["Content-Type"] == "application/vnd.pypi.simple.v1+json"
    assert resp.ok
    if headers.get("Access-Control-Allow-Origin") != "*":
        print(f"({idx}) Fail (json): {name}")

    # PEP 503
    resp = requests.get(f"https://pypi.org/simple/{name}/", headers={"Accept": "text/html"})
    headers = resp.headers

    assert headers["Content-Type"] == "text/html"
    assert resp.ok
    if headers.get("Access-Control-Allow-Origin") != "*":
        print(f"({idx}) Fail (html): {name}") 

    time.sleep(1)

Result:

(1) Fail (html): backoff
(2) Fail (html): better-exceptions
(3) Fail (html): pypdf2
(9) Fail (html): flask-swagger-ui
(23) Fail (html): sqlalchemy-mate
(25) Fail (html): scipy
(30) Fail (html): httpcore
(36) Fail (html): ngram
(37) Fail (html): ordered-set
(39) Fail (html): azure-mgmt-billing
(41) Fail (html): awscli-local
(43) Fail (html): azure-mgmt-datalake-analytics
(60) Fail (html): pamela
(65) Fail (html): ipympl
(78) Fail (html): pytzdata
(79) Fail (html): typish
(81) Fail (html): django-ckeditor
(85) Fail (html): pebble
(87) Fail (html): azure-common
(89) Fail (html): scandir
(91) Fail (html): oscrypto
(92) Fail (html): pydocstyle
(94) Fail (html): azure-mgmt-redhatopenshift
(95) Fail (html): cligj
(96) Fail (html): spacy-loggers

Which is a good news and I think I can continue on #65, as we will avoid using HTML APIs by default (though we will need to support and test it locally).

ryanking13 commented 1 year ago

Closing as completed, I'll open a separate issue for pep658.