jupyterlite / demo

JupyterLite demo deployed to GitHub Pages 🚀
https://jupyterlite.github.io/demo
356 stars 192 forks source link

Not possible to call out to external websites #141

Open markwilkinson opened 6 months ago

markwilkinson commented 6 months ago

Description

In both my own jupyterlite, and in the demo jupyterlite, it is not possible to call out to external websites. It always results in an error related to insecure requests. This happens with all URLs that I have tested, and happens whether or not the request call includes a "validate=true/false" flag.

Reproduce

  1. Code block:
import requests

def download_file_into_memory(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.content
    else:
        print(f"Failed to download file. Status code: {response.status_code}")
        return None

file_content = download_file_into_memory("https://cnn.com")
  1. Run

  2. See error:

/lib/python3.11/site-packages/urllib3/connectionpool.py:1101: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cnn.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
---------------------------------------------------------------------------
JsException                               Traceback (most recent call last)
File /lib/python3.11/site-packages/urllib3/contrib/emscripten/fetch.py:380, in send_request(request)
    378         js_xhr.setRequestHeader(name, value)
--> 380 js_xhr.send(to_js(request.body))
    382 headers = dict(Parser().parsestr(js_xhr.getAllResponseHeaders()))

JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://cnn.com/'.

During handling of the above exception, another exception occurred:
jtpio commented 6 months ago

@markwilkinson Could it be because cnn.com redirects to edition.cnn.com? Using https://edition.cnn.com/ directly in the code seems to be working fine:

image

markwilkinson commented 6 months ago

I don't think that's the problem... It seems that https://edition.cnn.com is the exception to the rule! I have added the auto-redirect flag and that doesn't solve the problem for any of the URLs that I want to use. I have also tried using https://github.com and https://google.ca and https://www.cbgp.upm.es (this last one I know for sure does not redirect). I have also tried in two browsers.

None of these work.

So I think the problem is real!

markwilkinson commented 6 months ago

I have also tried connecting directly to my server rather than the https reverse proxy (http://....) and that also throws an error (different error), but I have a feeling that Jupyter doesn't allow insecure connections anyway, so that might not be informative...??

markwilkinson commented 6 months ago

Have you had any further thoughts on this? I am still unable to resolve any URL, using the demo jupyterlite, other than the one you discovered that worked (edition.cnn.com). I have also tried starting from a new notebook, running %pip install requests and then trying to reach any website... same problem in all cases.

markwilkinson commented 4 months ago

Hi again! Have you (or anyone) found a work-around for this? I'm so excited to use jupyterlite, but all of the projects I need it for will be downloading their data from the Web, so... this is a real show-stopper for me!

Advice very welcome!

epugh commented 4 months ago

Have you tried using fetch... so, this isn't to an external site, but check out these examples of notebooks that I run in jupyterlite: https://github.com/o19s/quepid-jupyterlite/blob/main/jupyterlite/files/examples/Multiple%20Raters%20Analysis.ipynb

Maybe because "fetch" is javascript???

markwilkinson commented 4 months ago

Thanks for the suggestion! Unfortunately, that didn't work either, and with ~identical symptoms. the "await fetch" fails with "JsException: TypeError: Failed to fetch" for all URLs other than the one we identified at the top of this issue report (https://edition.cnn.com).

So... unless I am interested in what CNN has to say (I'm not), I continue to be out of luck! ;-)

mrkvn commented 4 months ago

I believe this is because of CORS. I'm not sure but I think there's no way around it. It's a browser security. You can hit a valid API endpoint though. You'd need a server for what you are trying to do. Then your server would be the one who will send an http request to the endpoint you want to hit. You might want to read this posted issue: https://github.com/jupyterlite/jupyterlite/issues/729#issue-1299865672

markwilkinson commented 3 months ago

Interesting! In most cases, I run the servers that I need to talk to from Jupyter, so I will try reconfiguring them to accept all in CORS. For the other cases, I will try your proxy ideas.

Thanks!! If this is the problem, then I suspect its going to be hard to fix in jupyterlite itself... which is sad! But a proxy is fine.

I'll report back here if this solves the problem. Thanks for the suggestion @mrkvn !

markwilkinson commented 3 months ago

@mrkvn this did solve the problem. It was necessary also to explicitly install support for https. Now it's all good! Thanks!