pyodide / pyodide

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
https://pyodide.org/en/stable/
Mozilla Public License 2.0
11.92k stars 807 forks source link

Write http.client in terms of Web APIs #140

Open mdboom opened 5 years ago

mdboom commented 5 years ago

This might not even be possible given blocking issues.

However, if we could write http.client in terms of Web APIs, we might be able to get things like pip partially working. As it stands, Python libraries built on top of raw sockets don't (can't) work.

anshuldutt21 commented 4 years ago

Hi, I would like to contribute to this issue.

rth commented 3 years ago

If I'm not mistaken, @kikocorreoso mentioned that another possibility could to look at the work done in Brython where some of these standard library modules might be re-implemented in JavaScript.

I quickly checked, and for instance http.client according to git commit messages, is identical to upstream, while _socket.py doesn't do anything, but maybe I'm missing something. Also https://github.com/brython-dev/brython/issues/1032#issuecomment-460934098 suggests that there are no replacements for low level network connectivity. Also couldn't find anything about it in Skulpt.

Anyway it would indeed be a good idea to read earlier discussion on this subject in the issue tracker of these projects.

kikocorreoso commented 3 years ago

@rth my comment was more in the vein of using js libs instead of py libs when it makes sense. Some of them are maybe implemented in Brython, batavia, skulpt,..., and coul be reused in some way.

For instance, re was very slow as it was implemented in pure Python in Brython so @PierreQuentel reimplemented the functionality in JS. https://github.com/brython-dev/brython/issues/1519

I suppose re is in WASM in PyOdide so maybe this example it is not very useful. I was thinking more in pure PY libs that have been rewritten in JS to adapt some behaviour to the browser/ for performance reasons, etc.

I don't know if this could help in terms of "DoNotReinventTheWheel", performance,...

rth commented 3 years ago

this could help in terms of "DoNotReinventTheWheel"

Yes, absolutely. Thanks for your comment! We should definitely look at what could be used/adapted in JS before implementing stuff :)

hoodmane commented 3 years ago

The main issue we have to deal with is the fact that http.client is a synchronous api and the relevant web APIs are asynchronous. I implemented a small piece of the http.client api on one of my comlink/syncifiers branches as a proof of concept and I think it works quite well.

However I think that it may be possible to use emscripten pthreads instead. If we can get it working with pthreads I think that would reduce how much code we need and potentially also lead to other useful features, though I think it will also allow much less fine grained control than my comlink approach. In particular, emscripten will take care of creating the web workers and deciding where the work should be done.

I'm really curious how and whether pthreads can make Python threading work. To do that, I guess the backing data of the wasm module needs to be stored in a sharedarraybuffer and all the python interpreter code needs to copied into multiple workers. Seems complicated but maybe emscripten pthreads does it...

On Tue, Jun 8, 2021, 2:43 AM Roman Yurchak @.***> wrote:

this could help in terms of "DoNotReinventTheWheel"

Yes, absolutely. Thanks for your comment! We should definitely look at what could be used/adapted in JS before implementing stuff :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pyodide/pyodide/issues/140#issuecomment-856624856, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCVWKQ54BD5QRZCWTQICT3TRXQ53ANCNFSM4FTM22IA .

datakurre commented 3 years ago

@hoodmane I’d fancy to check and try out the piece of http.client API, which you had implemented, but I was unable to find the correct branch. Would you be able to link your version here?

hoodmane commented 3 years ago

Yeah, the actual partial http_client implementation is here: https://github.com/hoodmane/pyodide/blob/comlink-demo/src/pyodide-py/pyodide/http_client.py

It uses a comlink fork which I have here: https://github.com/hoodmane/pyodide/tree/comlink-demo/comlink

The actual demo is here: https://github.com/hoodmane/pyodide/tree/comlink-demo/demos/syncio

I can't remember how well this stuff works. My plan is to work on the comlink port in this separate repository: https://github.com/hoodmane/synclink I suppose it would be good to make a comparable demo that uses that repo for the comlink fork.

datakurre commented 3 years ago

@hoodmane Thank you for the links. Unfortunately, that ended up being too much for me to get that work within the time I had, but at least that took me through learning building working pyodide from source :muscle:

rth commented 2 years ago

Interesting work @hoodmane ! So what do you think should be next steps on this? Threading #237 doesn't look that far away, most major browsers now support it I think.

For this comlink demo, I think it would help to make this a bit more visible? Maybe move some of it to the pyodide org?

Otherwise taking a different approach, aren't there some proxy that could change the MIME type of a response from binary to plain/text, so that we could still fetch it with pyodide.open_url? Either a external proxy or even in a service worker? Though I guess the latter, even if it works, is not very different in complexity from running a web worker.

ricardoprins commented 2 years ago

I'm just too lazy to read everything - I confess.

Since this (and #398, consequently) have a significant impact on pyscript (I'm surfing the hype as well), I want to help to get this done. So, which are the necessary steps to finish this task (and consequently solving indirectly the requests' issue)?

I wanna help, but I want to understand the "bigger picture" first.

hoodmane commented 2 years ago

@ricardoprins can we set up a meeting?

ricardoprins commented 2 years ago

Sure, that would be great.

hoodmane commented 2 years ago

It's weird that github has no DM feature. I guess you could use private repos for that purpose as a hack.

iuriguilherme commented 2 years ago

I have a question. Why make it blocking when there's aiohttp?

rth commented 2 years ago

Because there are a lot of libraries that are sync and use http.client (either directly or via requests, etc) and won't be able to use aiohttp or another async function as a replacement. Unless #2664 is implemented, but it would take time.

For the cases where async use is possible, we have added pyodide.http.pyfetch which has a somewhat similar API (but without the session context manager)

rtpg commented 1 year ago

Serious question: while synchronisity is a problem for JS because of the interaction model with the top level, if Pyodide's code evaluation entrypoints were all async (that is to say, runCode is also async) then the python-level code could all be synchronous and things could be papered over with Asyncify at the Python/C FFI layer, maybe? After all, CPython itself already has a similar yielding concept in place.

Given there is in theory full control of the Python VM here I want to believe there is a way forward that doesn't involve too much pain.

hoodmane commented 1 year ago

In order for asyncify to work, all C, C++, Rust, fortran, etc code would have to be linked with it both in the main module and in side modules. I think the performance cost would be significant and we would probably have to find and fix bugs in asyncify. If someone does this and profiles it to be okay for performance we might consider it. But I think the costs are too high.

twinsant commented 1 year ago

So, what's the progress?

ross-spencer commented 1 year ago

Can I ask a question about security?

My understanding currently is that if you try and POST data you'll get an error such as:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/lib/python3.10/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 23] Host is unreachable>

If this feature is implemented, will it be entirely up to the underlying library to "promise" not to forward data loaded into the browser to another source? Are there other controls on this type of issue?

rth commented 1 year ago

Can I ask a question about security?

Sure. That error doesn't mean you cannot make that post request, only that you can't make it with urllib. Making it with pyodide.http.pyfetch (or via JS functions) would work. So generally libraries can make arbitrary network connections both when running on host Python and in the browser.

You can whitelist the allowed domains in the browser with CORS apparently.

alekssamos commented 1 year ago

since PHP and Wordpress exist, maybe you can make sockets?

I found a project where they compile PHP, SQLite and run Wordpress. https://wordpress.wasmlabs.dev/

I think PHP can use sockets somehow. Otherwise, how does the browser interact with this PHP? Maybe you can still add sockets to pyodide? And there will be libraries urllib, requests, aiohttp, httpx. Or is it impossible and will have to be done only exclusively through js? And web sockets (ws) can be to do?

What do you think about it?

Yes, I read the FAQ (1, 2, 3) where it was mentioned, but since I found PHP + Wordpress, I wanted to ask again. searched here, there have already been similar topics. So, these are the limitations of the web assembler virtual machine itself, the browser, or restrictions only on the pyodide side?

ArzelaAscoIi commented 1 month ago

any update on this ? :)

hoodmane commented 1 month ago

Well once JSPI is available by default in v8 we'll be able to add support for this in v8-based runtimes like node, chrome. Firefox and Safari support will have to wait until they also implement JSPI.