koenvo / pyodide-http

Provides patches for widely used http libraries to make them work in Pyodide environments like JupyterLite
MIT License
77 stars 14 forks source link

AttributeError: module 'urllib.request' has no attribute 'HTTPSHandler' when using astropy #33

Closed ManonMarchand closed 5 months ago

ManonMarchand commented 1 year ago

Hello and thanks for this library!

I was unsure about where to post this issue but I'm wondering about why pyodide-http does not work with astropy.

Here is a minimal non-working example :

# do pyodide http magics like in the readme here
from astropy.coordinates import SkyCoord
SkyCoord.from_name("Crab Nebula")

In jupyterlite, the output is like this :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 SkyCoord.from_name("M1")

File /lib/python3.10/site-packages/astropy/coordinates/sky_coordinate.py:2218, in SkyCoord.from_name(cls, name, frame, parse, cache)
   2183 """
   2184 Given a name, query the CDS name resolver to attempt to retrieve
   2185 coordinate information for that object. The search database, sesame
   (...)
   2213     Instance of the SkyCoord class.
   2214 """
   2216 from .name_resolve import get_icrs_coordinates
-> 2218 icrs_coord = get_icrs_coordinates(name, parse, cache=cache)
   2219 icrs_sky_coord = cls(icrs_coord)
   2220 if frame in ("icrs", icrs_coord.__class__):

File /lib/python3.10/site-packages/astropy/coordinates/name_resolve.py:170, in get_icrs_coordinates(name, parse, cache)
    167 for url in urls:
    168     try:
    169         resp_data = get_file_contents(
--> 170             download_file(url, cache=cache, show_progress=False)
    171         )
    172         break
    173     except urllib.error.URLError as e:

File /lib/python3.10/site-packages/astropy/utils/data.py:1509, in download_file(remote_url, cache, show_progress, timeout, sources, pkgname, http_headers, ssl_context, allow_insecure)
   1507 for source_url in sources:
   1508     try:
-> 1509         f_name = _download_file_from_source(
   1510             source_url,
   1511             timeout=timeout,
   1512             show_progress=show_progress,
   1513             cache=cache,
   1514             remote_url=remote_url,
   1515             pkgname=pkgname,
   1516             http_headers=http_headers,
   1517             ssl_context=ssl_context,
   1518             allow_insecure=allow_insecure,
   1519         )
   1520         # Success!
   1521         break

File /lib/python3.10/site-packages/astropy/utils/data.py:1293, in _download_file_from_source(source_url, show_progress, timeout, remote_url, cache, pkgname, http_headers, ftp_tls, ssl_context, allow_insecure)
   1290         else:
   1291             raise
-> 1293 with _try_url_open(
   1294     source_url,
   1295     timeout=timeout,
   1296     http_headers=http_headers,
   1297     ftp_tls=ftp_tls,
   1298     ssl_context=ssl_context,
   1299     allow_insecure=allow_insecure,
   1300 ) as remote:
   1301     info = remote.info()
   1302     try:

File /lib/python3.10/site-packages/astropy/utils/data.py:1205, in _try_url_open(source_url, timeout, http_headers, ftp_tls, ssl_context, allow_insecure)
   1201 # Always try first with a secure connection
   1202 # _build_urlopener uses lru_cache, so the ssl_context argument must be
   1203 # converted to a hashshable type (a set of 2-tuples)
   1204 ssl_context = frozenset(ssl_context.items() if ssl_context else [])
-> 1205 urlopener = _build_urlopener(
   1206     ftp_tls=ftp_tls, ssl_context=ssl_context, allow_insecure=False
   1207 )
   1208 req = urllib.request.Request(source_url, headers=http_headers)
   1210 try:

File /lib/python3.10/site-packages/astropy/utils/data.py:1179, in _build_urlopener(ftp_tls, ssl_context, allow_insecure)
   1176 if cert_chain:
   1177     ssl_context.load_cert_chain(**cert_chain)
-> 1179 https_handler = urllib.request.HTTPSHandler(context=ssl_context)
   1181 if ftp_tls:
   1182     urlopener = urllib.request.build_opener(_FTPTLSHandler(), https_handler)

AttributeError: module 'urllib.request' has no attribute 'HTTPSHandler'

You can have a look at it there in the notebook 04-sesame.ipynb :

https://cds-astro.github.io/jupyterlite/lab/index.html

From there, what I understand is that maybe urllib needs more patching in order to work with astropy? Or is it more an issue that I should post on their side of the story?

Thanks again!

(PS: the example uses a really cool function that outputs the coordinates of any objects for any of their registered names or designations :) )

koenvo commented 1 year ago

Thanks for the kind words!

Your example makes it easier to find the problem. pyodide-http does not patch urllib.request.HTTPSHandler at the moment.

Let me figure out of we can patch it and ignore the context argument ( https://github.com/astropy/astropy/blob/cc73b24619ce37f2af26a0140bbdda8015ac8265/astropy/utils/data.py#LL1170C49-L1170C56 )

ManonMarchand commented 1 year ago

That would be super cool. Do you need help?

Also, the example https://github.com/koenvo/pyodide-http/blob/main/examples/pyvo.html returns the same error because pyvo is using astropy.

koenvo commented 1 year ago

Still trying to reproduce the issue. The example pyvo.html needs a little change as it needs pyodide-http>=0.2.1 to make it work in FireFox/Safari but furthermore it works fine here.

Could it be a different pyodide version? When loading it shows version pyodide-0.22.1.

When I try this code it also works fine in Chrome/FireFox and Safari:

<html>
    <head>
        <link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />
        <script defer src="https://pyscript.net/latest/pyscript.js"></script>
    </head>
    <body>
    <py-config>
        packages = ["ssl", "pyodide-http>=0.2.1", "astropy"]
    </py-config>

    <py-script>
        import pyodide_http
        pyodide_http.patch_all()

        from astropy.coordinates import SkyCoord
        res = SkyCoord.from_name("Crab Nebula")

        print(res)
    </py-script>
    </body>
</html>

output

<SkyCoord (ICRS): (ra, dec) in deg
    (83.6287, 22.0147)>
ManonMarchand commented 1 year ago

The example works for me too now with pyscript :sparkles: :crab: :star:

Then the issue might be more on jupyterlite side? @jtpio sorry to tag you but do you know what's happening?

On the question of the pyodide version, in the cds-astro/jupyterlite there is the jupyterlite-pyodide-kernel v0.0.8. It looks like they are using pyodide 0.23.2 ? : https://github.com/jupyterlite/pyodide-kernel

jtpio commented 1 year ago

Thanks both for looking into this!

On the question of the pyodide version, in the cds-astro/jupyterlite there is the jupyterlite-pyodide-kernel v0.0.8. It looks like they are using pyodide 0.23.2 ? :

Right, jupyterlite-pyodide-kernel uses the latest stable release of Pyodide which is currently 0.23.2.

So it could indeed be related to the Pyodide version.

jtpio commented 1 year ago

Just checked with the Pyodide console directly and it is giving the same error. Although an extra micropip.install("ssl") seems to be required in the console: https://pyodide.org/en/stable/console.html

image

The console also runs 0.23.2:

image

rth commented 1 year ago

I think requests 2.30.0 that was released on May 3 broke pyodide-http monkeypatching. Using a previous version works.

>>> import micropip
>>> await micropip.install(["requests==2.29.0", "ssl", "pyodide-http>=0.2.1", "astropy"])
>>> import pyodide_http; pyodide_http.patch_all()
>>> from astropy.coordinates import SkyCoord
>>> res = SkyCoord.from_name("Crab Nebula")
>>> res
<SkyCoord (ICRS): (ra, dec) in deg
    (83.6287, 22.0147)>

So there should probably be a range of compatible requests versions specified with a given pyodide-http version?

koenvo commented 1 year ago

Hmm I tried to use requests 2.30.0 from the examples/pyvo.html and that works.

The pyodide 0.22.1 version includes python 3.10 and Pyodide 0.23.2 included python 3.11. It also seems thatastropy doesn't use the requests library, right?

I suspect it to be something with a different python version instead of a different requests version.

[edit] The AttributeError: module 'urllib.request' has no attribute 'HTTPSHandler' exception seems to be related to the ssl package and not to pyodide_http.

urllib.request.HTTPSHandler is defined when http.client.HTTPSConnection exists: https://github.com/python/cpython/blob/3.11/Lib/urllib/request.py#L1381

http.client.HTTPSConnection is defined when ssl can be imported: https://github.com/python/cpython/blob/3.11/Lib/http/client.py#L1402

rth commented 1 year ago

It also seems that astropy doesn't use the requests library, right?

Yeah, you are right it can't be that. Went to the conclusion too quickly )

http.client.HTTPSConnection is defined when ssl can be imported:

So indeed in the notebook 04-sesame.ipynb :

cds-astro.github.io/jupyterlite/lab/index.html

where this is reproducible

image

while in the Pyodide REPL which has exactly same pyodide version would have http.client.HTTPSConnection defined.

A plausible scenario is that http.client is imported somewhere in Jupyterlite directly or indirectly before the ssl module is loaded, leading to this. Maybe we should do some post-initialization step after SSL module is loaded to reload stdlib modules that depend on it (https://github.com/pyodide/pyodide/issues/3856). Or maybe you could do that in pyodide-http.

If I add,

from importlib import reload

import http.client
import urllib.request

reload(http.client)
reload(urllib.request)

to the above notebook, I would now get,

BadStatusLine: HTTP/1.1 0
``` --------------------------------------------------------------------------- BadStatusLine Traceback (most recent call last) Cell In[5], line 1 ----> 1 SkyCoord.from_name("M1") File /lib/python3.11/site-packages/astropy/coordinates/sky_coordinate.py:2218, in SkyCoord.from_name(cls, name, frame, parse, cache) 2183 """ 2184 Given a name, query the CDS name resolver to attempt to retrieve 2185 coordinate information for that object. The search database, sesame (...) 2213 Instance of the SkyCoord class. 2214 """ 2216 from .name_resolve import get_icrs_coordinates -> 2218 icrs_coord = get_icrs_coordinates(name, parse, cache=cache) 2219 icrs_sky_coord = cls(icrs_coord) 2220 if frame in ("icrs", icrs_coord.__class__): File /lib/python3.11/site-packages/astropy/coordinates/name_resolve.py:170, in get_icrs_coordinates(name, parse, cache) 167 for url in urls: 168 try: 169 resp_data = get_file_contents( --> 170 download_file(url, cache=cache, show_progress=False) 171 ) 172 break 173 except urllib.error.URLError as e: File /lib/python3.11/site-packages/astropy/utils/data.py:1509, in download_file(remote_url, cache, show_progress, timeout, sources, pkgname, http_headers, ssl_context, allow_insecure) 1507 for source_url in sources: 1508 try: -> 1509 f_name = _download_file_from_source( 1510 source_url, 1511 timeout=timeout, 1512 show_progress=show_progress, 1513 cache=cache, 1514 remote_url=remote_url, 1515 pkgname=pkgname, 1516 http_headers=http_headers, 1517 ssl_context=ssl_context, 1518 allow_insecure=allow_insecure, 1519 ) 1520 # Success! 1521 break File /lib/python3.11/site-packages/astropy/utils/data.py:1293, in _download_file_from_source(source_url, show_progress, timeout, remote_url, cache, pkgname, http_headers, ftp_tls, ssl_context, allow_insecure) 1290 else: 1291 raise -> 1293 with _try_url_open( 1294 source_url, 1295 timeout=timeout, 1296 http_headers=http_headers, 1297 ftp_tls=ftp_tls, 1298 ssl_context=ssl_context, 1299 allow_insecure=allow_insecure, 1300 ) as remote: 1301 info = remote.info() 1302 try: File /lib/python3.11/site-packages/astropy/utils/data.py:1211, in _try_url_open(source_url, timeout, http_headers, ftp_tls, ssl_context, allow_insecure) 1208 req = urllib.request.Request(source_url, headers=http_headers) 1210 try: -> 1211 return urlopener.open(req, timeout=timeout) 1212 except urllib.error.URLError as exc: 1213 reason = exc.reason File /lib/python3.11/site-packages/pyodide_http/_urllib.py:58, in urlopen_self_removed(self, url, *args, **kwargs) 57 def urlopen_self_removed(self, url, *args, **kwargs): ---> 58 return urlopen(url, *args, **kwargs) File /lib/python3.11/site-packages/pyodide_http/_urllib.py:53, in urlopen(url, *args, **kwargs) 41 response_data = ( 42 b"HTTP/1.1 " 43 + str(resp.status_code).encode("ascii") (...) 49 + resp.body 50 ) 52 response = HTTPResponse(FakeSock(response_data)) ---> 53 response.begin() 54 return response File /lib/python311.zip/http/client.py:318, in HTTPResponse.begin(self) 316 # read until we get a non-100 response 317 while True: --> 318 version, status, reason = self._read_status() 319 if status != CONTINUE: 320 break File /lib/python311.zip/http/client.py:306, in HTTPResponse._read_status(self) 304 status = int(status) 305 if status < 100 or status > 999: --> 306 raise BadStatusLine(line) 307 except ValueError: 308 raise BadStatusLine(line) BadStatusLine: HTTP/1.1 0 ```

which is still unclear but at least different from the original error. No idea how the status return code can be 0.

jobovy commented 1 year ago

Because galpy has a similar from_name function as astropy, I'm quite interested in this issue. I actually tried the fix @rth suggested and when I run

import micropip
await micropip.install(["ssl", "pyodide-http>=0.2.1", "astropy"])
import pyodide_http; pyodide_http.patch_all()
from astropy.coordinates import SkyCoord
res = SkyCoord.from_name("Crab Nebula")

in the stable pyodide REPL, I get

pyodide.ffi.JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load '[http://cdsweb.u-strasbg.fr/cgi-bin/](http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/A?Crab%20Nebula)
[nph-sesame/A?Crab%20Nebula](http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/A?Crab%20Nebula)'.

with the full error message:

```python Traceback (most recent call last): File "", line 1, in File "/lib/python3.11/site-packages/astropy/coordinates/sky_coordinate.py", line 2218, in from_name icrs_coord = get_icrs_coordinates(name, parse, cache=cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/astropy/coordinates/name_resolve.py", line 170, in get_icrs_coordinates download_file(url, cache=cache, show_progress=False) File "/lib/python3.11/site-packages/astropy/utils/data.py", line 1509, in download_file f_name = _download_file_from_source( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/astropy/utils/data.py", line 1293, in _download_file_from_source with _try_url_open( ^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/astropy/utils/data.py", line 1211, in _try_url_open return urlopener.open(req, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/pyodide_http/_urllib.py", line 58, in urlopen_self_removed return urlopen(url, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/pyodide_http/_urllib.py", line 31, in urlopen resp = send(request) ^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/pyodide_http/_core.py", line 121, in send xhr.send(to_js(request.body)) pyodide.ffi.JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load '[http://cdsweb.u-strasbg.fr/cgi-bin/](http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/A?Crab%20Nebula) [nph-sesame/A?Crab%20Nebula](http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/A?Crab%20Nebula)'. ```

The console actually reveals that this is a an error from mixing HTTP content on an HTTPS site

pyodide.asm.js:9 Mixed Content: The page at 'https://pyodide.org/en/latest/console.html' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/A?Crab%20Nebula'. This request has been blocked; the content must be served over HTTPS.

In Chrome, one can allow insecure content and doing that, the code runs fine. However, in jupyterlite, even with the from importlib import reload... fix, this still doesn't work, perhaps because jupyterlite runs in a webworker? The same mixed-content error keeps appearing even when allowing insecure content.

So I think we should then just upstream fix this by making an HTTPS request here in astropy: https://github.com/astropy/astropy/blob/cc73b24619ce37f2af26a0140bbdda8015ac8265/astropy/coordinates/name_resolve.py#L30-39 because I believe those URLs work as https:// ones.

ManonMarchand commented 1 year ago

Thanks for looking! This PR open in astropy is exactly to change the link to sesame to its https version (and other CDS things too) 🙂

https://github.com/astropy/astropy/pull/14681

But what about the data that don't have a https address? Like for example some old nasa mission? We will never be able to query them through jupyterlite?

koenvo commented 1 year ago

I tried to summarise the issue to better understand what's going on, and came to this summary:

  1. In some cases the ssl module isn't loaded in time which causes urllib.request.HTTPSHandler to be unvailable
  2. By default pyodide_http version 0.2.0 is used which passed the User-Agent header to XMLHTTPRequest. This causes the browser to reject the request as the User-Agent header is not allowed (in some browsers)
  3. The source data for astropy is requested over http and while the page is hosted over https. This causes a Mixed Content exception and will result in a failed request in python

Curious if you come to the same summary.

From these issues there are some possible solutions/fixes. When I look at the possibilities at the pyodide_http side, I see the following options:

  1. Inform the user when they are mixing content - request over http while page is served over https. Optionally try to request the content overhttpsinstead ofhttp`. This is related to https://github.com/koenvo/pyodide-http/issues/26
  2. reload the http.client and subsequential modules module when http.client.HTTPSConnection isn't available

To answer @ManonMarchand question about old data: the source needs to support both https and CORS headers to make it work in JupyterLite. Would it be possible to host the data somewhere else? Other option is to proxy the data but that can become quite costly.

jobovy commented 9 months ago

Hi all,

It seems like the fix that I proposed earlier in this thread stopped working, because the CORS error that I got here (that we resolved through changing the URL to https://) reared its head again as

pyodide.asm.js:9 Mixed Content: The page at 'https://jupyterlite.github.io/demo/extensions/@jupyterlite/pyodide-kernel-extension/static/568.621d55d3f28fca39d88b.js?v=621d55d3f28fca39d88b' 
was loaded over HTTPS, but attempted to connect to the insecure WebSocket endpoint 'ws://cds.unistra.fr:443/'. 
This request has been blocked; this endpoint must be available over WSS.

So it seems like with newer versions of jupyterlite/pyodide, a websocket is used instead of a XMLHttpRequest and the websocket is insecure even if the starting URL was secure. I can't figure out where this change happened. I wonder whether it has to do with jupyterlite switching to running in a Service Worker.

rth commented 9 months ago

I'm pretty sure it didn't happen in Pyodide, so you probably should report this to JupyterLite.

jobovy commented 9 months ago

I'm pretty sure it didn't happen in Pyodide, so you probably should report this to JupyterLite.

~Actually, this happens in the pyodide REPL as well (both stable and latest), so I think it must be in pyodide?~ EDIT: Nevermind, it does work, see below.

Screenshot 2023-12-18 at 11 57 34 AM
ManonMarchand commented 8 months ago

It works in pyodide repl with :

>>> import micropip
>>> await micropip.install(["ssl", "pyodide-http>=0.2.1", "astropy"])
>>> from astropy.coordinates import SkyCoord
>>> import pyodide_http; pyodide_http.patch_all()
>>> SkyCoord.from_name('NGC3256')
<SkyCoord (ICRS): (ra, dec) in deg
    (156.9636833, -43.9037639)>

But I cannot find any combination of load/reload/patch that makes it work in jupyterlite. So I guess we should open an issue there?

jobovy commented 8 months ago

I found that the issue in jupyterlite is that ssl wasn't installed when http.client and urllib.request are first imported (these are likely imported in their IPython setup) and for some reason reloading them doesn't fix the issue in jupyterlite as it does in pure pyodide. So I fixed the issue by just installing ssl before anything else is installed in the initialization of jupyterlite's pyodide-kernel, which is enough to fix this issue here: https://github.com/jupyterlite/pyodide-kernel/pull/79.

A new version of pyodide-kernel was released that includes this fix, which you can check here: https://jupyterlite-pyodide-kernel.readthedocs.io/en/latest/_static/ (of course, you still need the pyodide-http patching, so try this example:

>>> import micropip
>>> await micropip.install(["ssl", "pyodide-http>=0.2.1", "astropy"])
>>> from astropy.coordinates import SkyCoord
>>> import pyodide_http; pyodide_http.patch_all()
>>> SkyCoord.from_name('NGC3256')

). I think this issue here can therefore be closed now.