pyodide / pyodide

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
https://pyodide.org/en/stable/
Mozilla Public License 2.0
12.31k stars 848 forks source link

IO in a webworker without asyncio? #1219

Open hoodmane opened 3 years ago

hoodmane commented 3 years ago

I'm building a web page where students can edit Python code, then click "Run" to run it in pyodide. I started with all of this running in the main browser thread, which worked pretty well. As part of this, I added some glue code so that input() in Python became a prompt() in the browser.

However, when pyodide runs in the main thread, my code cannot update the display (e.g., show output), and controls are locked, so I can't have a "Stop" button to kill Python code that runs out of control. So I am working on a version where pyodide runs as a web worker. I've got that working for most of what I want, except for getting input back from the user.

Specifically, I send messages to from the worker to the main browser thread to launch the prompt, then send a message back from the main thread to the worker with the result. The problem is, I can't see any way to get the Python code to wait for the return message. I think that may be possible in general, if I write all-async code. The problem is, the incoming code is not written for an async environment (e.g., the students will use input(), not await input()). So even if I patch input(), my new version has to be a sync function to give a result back to the main code. So then my patched input() can't await an async function, e.g., a JavaScript promise that clears when the message comes back from the main browser thread.

I tried using asyncio.get_running_loop().run_until_complete() to await the promise, but this doesn't do anything, as noted above. I've investigated using emscripten's Asyncify framework, and I'm pretty sure I can get that to create a synchronous C function that will await a JavaScript promise. But I'm not sure if I can call the C function from Python in a way that is compatible with the Asyncify system. And even if I can, it will be pretty kludgy.

I'm now thinking about using shared memory between the browser thread and the worker, then running a busy loop in the worker that polls for a response via the shared memory (since I don't think sleep is possible). But I'd rather not use such a narrow solution and peg the processor that way.

So I just wanted to check, is there currently any path to get synchronous Python code to wait for a JavaScript promise or async function?

Originally posted by @mfripp in https://github.com/iodide-project/pyodide/issues/1158#issuecomment-775632060

hoodmane commented 3 years ago

Your understanding of everything is pretty much accurate.

Would you consider just asking your students to add async and await everywhere? I understand it is a notational pollution that could be avoided using native IO and may be hard for students to understand. I guess it may be pedagogically bad to make students keep track of two different types of functions and two different calling conventions... But it's unfortunate that Python has a language builtin tool to help you deal with this situation and you can't use it. And these days there are many native IO libraries that use async/await rather than select.

I've investigated using emscripten's Asyncify framework, and I'm pretty sure I can get that to create a synchronous C function that will await a JavaScript promise. But I'm not sure if I can call the C function from Python in a way that is compatible with the Asyncify system. And even if I can, it will be pretty kludgy.

I think this might be the least worst solution.

I'm now thinking about using shared memory between the browser thread and the worker, then running a busy loop in the worker that polls for a response via the shared memory.

Please no! This would surely work, but it is pretty sad...

So I just wanted to check, is there currently any path to get synchronous Python code to wait for a JavaScript promise or async function?

One remaining route you could try: use Atomics.wait and Atomics.notify. The docs say that this should allow you to block a worker thread while waiting for a response from a different thread. I tried to get this to work before but it didn't seem to do anything. Note that you can only use Atomics.wait on an Int32Array, for some bizarre reason it is disabled on other TypedArray types.

Note also that Atomics.wait has no Safari support. It has two years of support on Chrome and six months of support on Firefox: https://caniuse.com/mdn-javascript_builtins_atomics_wait

mfripp commented 3 years ago

@hoodmane, thanks a lot for the confirmation and suggestions.

I've done some tests using emscripten's Asyncify with a C function, but the news is not good, at least for me. I was hoping to add a synchronous C function to the module, then call it from Python as if it were a synchronous JavaScript function, then have it pause (and run the event loop) until the promise was resolved. However, it turns out that the description of Asyncify is not quite right (although the name is). Asyncify effectively turns the C function into an async function. It doesn't let you call async functions synchronously and wait for the result. Instead, if you call a C function from JavaScript, and the C function uses Asyncify to await JavaScript code, then the C function is split and returns 0 immediately. Later, when the call is finished, Asyncify rewinds the C function and runs the portion after the handleAsync call. This is very similar to how an async JavaScript function works.

I was hoping that Asyncify somehow paused the C function and processed the JavaScript event loop for a while, before returning control back to the C function, which could then return a value to whatever called it. I think Python's run_until_complete() does something like that, but I haven't seen anything similar for JavaScript. In retrospect I should have guessed that wasn't happening here, since the Asyncify documentation talks about rewinding the stack, which is more like what async/await does.

So to use Asyncify for this, I think I would need build pyodide myself using the Asyncify flags. Then I'd need to use Asyncify.handleAsync to call async JavaScript code from Python. I think I would also need to tell Asyncify the name of the C function that calls the JavaScript code -- I couldn't figure out where that happens after a few minutes of searching. (It would be really cool to add a built-in behavior in pyodide that uses Asyncify.handleAsync whenever pyodide is running synchronously and the user calls an async JavaScript function. If that only ever happened in one special-purpose function, maybe that would reduce the performance problems?)

That's a little more than I can take on for now. I'll take a look at the other options, or maybe just have students presupply inputs in a textbox. That's good enough for this; I was just hoping to get the nice back-and-forth input dialog working.

hoodmane commented 3 years ago

Have you looked into Atomics.wait? It sounds extremely promising. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/wait

mfripp commented 3 years ago

I was a little scared off by your comment that "I tried to get this to work before but it didn't seem to do anything." And I'm anxious about narrowing the field of supported browsers so much. But I could give it a quick look, at least for proof of concept.

hoodmane commented 3 years ago

This recent blog post seems to say that Atomics.wait works for what you want here: https://jasonformat.com/javascript-sleep/

hoodmane commented 3 years ago

They have a demo here: https://sleep-sw.glitch.me/ ant the source code: https://glitch.com/edit/#!/sleep-sw?path=worker.js%3A1%3A0

hoodmane commented 3 years ago

I got blocking IO working with Atomics.wait and Atomics.notify here: https://hoodmane.github.io/worker-pyodide-console/ https://github.com/hoodmane/worker-pyodide-console It also handles keyboard interrupts. The github link only works in Chromium-based browsers because it isn't served with the correct cross origin policy, but served from an appropriate server it would work in Firefox too.

joemarshall commented 3 years ago

Hey, I needed this too, and I've got some code in progress that throws an exception to jump out of python to JavaScript then recreates the python stack to go right back to the original exception location. It seems to work okay. I have students with iOS who have to use Safari, so I couldn't use web workers.

What this means is that a) you can jump out to receive JavaScript messages any time you want. b) you can do delays or io waits in JavaScript without spin locking anything.

https://github.com/joemarshall/unthrow/blob/main/unthrow/unthrow.pyx

joemarshall commented 3 years ago

It should work in wasm but I haven't built it in pyodide yet.

joemarshall commented 3 years ago

@mfripp not sure if you're following this issue but I'm also doing similar to you, creating a consistent online python environment for students to use, and the stuff I posted above about checkpointing python so JavaScript can run might be of use to you also.

wushufen commented 3 years ago
function run(code) {
  // async input
  pyodide.globals.input = async function () {
    return new Promise((r) => setTimeout((_) => r('async_input')))
  }

  // +await
  code = code.replace(/\binput\s*[(]/g, 'await $&')

  // run
  pyodide
    .runPythonAsync(code)
    .then((output) => output && console.log(output))
    .catch((err) => {
      console.log(err)
    })
}
hoodmane commented 3 years ago

As an update, I think @joemarshall's unthrowables are a good idea. Hopefully we can get some version of them added in v0.18.

hoodmane commented 3 years ago

Closed in favor of #1503.

alexmojaki commented 3 years ago

I'm also trying to port an educational platform (http://futurecoder.io/) to pyodide. Nice to have so much company!

unthrow looks very interesting, but I don't feel great about it. It also seems to rely on sys.settrace() which prevents other debuggers like pdb from working, so this goes against #550.

Atomics.wait() seems like it works great from your demo, and I'm not too bothered about telling Safari users to use Chrome and not support mobile.

But that blog post https://jasonformat.com/javascript-sleep/ suggests another ingenious idea: synchronous XHR intercepted by a service worker. I managed to get this working, see https://replit.com/@alexmojaki/pyodide-service-worker-input for source and https://pyodide-service-worker-input.alexmojaki.repl.co/ to try it out. Open the dev console to see when pyodide is ready. Then for example you could first run this code:

print(1)
print(input() * 2)
print(2)
print(input() * 3)
print(3)

and then 'Run' two arbitrary strings which will go into input().

Here's how it works. When you first click Run, it posts a message to the web worker:

worker.postMessage(code.value)

The web worker listens for messages and passes the code to pyodide:

pyodide.globals.get("run_code")(e.data)

The python function run_code calls exec(code) which eventually hits input(), calling sys.stdin.readline, which is patched by the custom function get_input(), which calls the JS function getInput() in the web worker:

function getInput(output) {
  // Tell the browser thread about input printed so far, and that user is waiting for input()
  postMessage({awaitingInput: true, output: output});

  const request = new XMLHttpRequest();
  request.open('GET', '/get_input/', false);  // `false` makes the request synchronous
  request.send(null);
  return request.responseText;
}

The request is intercepted by the service worker:

let resolver;

addEventListener('fetch', e => {
  const u = new URL(e.request.url);
  if (u.pathname === '/get_input/') {
    e.respondWith(new Promise(r => resolver = r));
  }
});

addEventListener('message', event => {
  resolver(new Response(event.data,{status:200}));
});

The service worker responds with a promise that is resolved when it receives a message from the browser thread. Meanwhile back in the browser thread, the message is received from the worker's getInput() so that it can display the output so far and note that python's input() is waiting. Therefore it knows to send the next message to the service worker instead of the web worker, which will eventually be received in python's input().

I'm very new to all of this stuff so I may be missing something critical, but it seems to work fine. I think service workers may be killed randomly so getInput() will probably need to make requests in a loop.

hoodmane commented 3 years ago

It also seems to rely on sys.settrace() which prevents other debuggers like pdb from working, so this goes against #550.

I have a way to fix this, the low level work for this is already underway: https://github.com/pyodide/pyodide/blob/main/cpython/patches/0001-Add-pyodide-callback.patch

hoodmane commented 3 years ago

synchronous XHR intercepted by a service worker

Cool idea. So now we have three possible approaches:

  1. Unthrow (works on any browser without any thread communication, even on main thread. Have to be careful about C stack frames though.)
  2. Atomics (requires SharedArrayBuffer, https, webworker, no support on safari)
  3. Sync XHR (requires service worker, webworker so https, probably would work on Safari)

Ideally we could use unthrow for development (setting up a service worker can be a bit of a pain, and it breaks the "refreshing the page gives a completely fresh start," using a normal worker at least has no persistent state but still slows development).

Then once logic is more or less ready, we could switch to 2 or 3 for release.

Have to be careful to make it so we can swap out this different techniques, but I have ideas for that.

alexmojaki commented 2 years ago

I've written a library https://github.com/alexmojaki/sync-message which helps with synchronous IO with a worker using either SharedArrayBuffer or a service worker. I'm using it in my own https://github.com/alexmojaki/futurecoder and I've also integrated it into https://github.com/dodona-edu/papyros. It has no dependencies and is meant to fit any use case, not just Pyodide. I think it would fit well in https://github.com/hoodmane/synclink.

bslatkin commented 2 years ago

Poking this thread, it appears that Atomics.wait and SharedArrayBuffer are now both available in Safari on Desktop and Mobile (since December 2021):

https://caniuse.com/mdn-javascript_builtins_atomics_wait https://caniuse.com/sharedarraybuffer

Can anyone else confirm that? Maybe I'm reading it wrong.

So perhaps the right approach is for Pyodide to provide a Promise shim for this so the stdout/stderr/stdin callbacks passed to loadPyodide can be async functions? Then the whole service worker part of this can be avoided. The same shim could be used for other streams too (e.g., webrtc, websockets).

alexmojaki commented 2 years ago

Yes, SharedArrayBuffer has been available on Safari for a while, but it requires cross origin isolation on any browser, and that's often a problem on its own. In futurecoder it messed with a few things so I chose to drop that path and only use service workers, but still with sync-message.

joemarshall commented 2 years ago

If you're happy for your site to be served with cross site isolation enabled, then it is possible to enable isolation using a service worker. This means you can write everything with atomics, which works great and feels like the right way until the stack switching proposals become implemented in browsers. And you can still host it anywhere e.g. GitHub pages.

Service worker enabling isolation doesn't work in incognito mode, but neither do service worker based solutions either, so no loss there.

elilambnz commented 1 year ago

For those looking for a solution using service workers and synchronous XHR requests, I've added support for this in my library react-py, runnable example here.

Of course you don't have to use React, the code can be found on GitHub - look for the workers and providers directories.

Tested with Vite, Next.js and Docusaurus + GitHub pages (docs site).

Hephaistos7 commented 8 months ago

Is there any conclusion on a recommended way to implement Pyodide in a web worker for IO input?

It seems that there have been new developments in this area elsewhere. So it would be kinda cool to have an update on an overview. @alexmojaki has done a lot of work on this: sync-message, pyodide-worker-runner @hoodmane has also done work on it: synclink

@hoodmane also started the following issue: Improved Webworker ease of use @jtpio started the issue: Matplotlib backend in a webworker

We should make an overview of packages and their use cases in a webworker.

I imagine all these functionalities should be built and documented inside Pyodide at some point. Right now, we use plugins for it. I'd be really helpful to know exactly which plugins to use.

For debugging, there seems to be the following two packages: snoop, birdseye. @alexmojaki I see your contributions on Pyodide all over the net. Are you associated or in contact with the Pyodide developers in any way? I imagine they could really use your work.

I myself am replacing Skulpt with Pyodide for the following website XLogo. And I really want to accomplish that task in a stable and maintainable way.

alexmojaki commented 8 months ago

Here's a rough guide:

  1. Try enabling cross-origin isolation (COI) on your site. There's a good chance this will cause something to stop working, so test carefully.
  2. Try using sync-message in a very basic way just to confirm that you can create and use a channel, whether with COI or a service worker.
  3. If COI is working, try using synclink to compare. @hoodmane should confirm if this is actually recommended. I haven't tried it myself and I don't understand what little documentation there is - why does the example seem to show synchronously blocking in the main thread instead of in the worker?
  4. If you would rather use sync-message than synclink, then the next question is: Are you fine with using Comlink for the async communication with your worker? If so, then try using https://github.com/alexmojaki/comsync.
  5. If you can use comsync, then try using https://github.com/alexmojaki/pyodide-worker-runner.

For a full example of pyodide-worker-runner in action, see:

@alexmojaki I see your contributions on Pyodide all over the net. Are you associated or in contact with the Pyodide developers in any way? I imagine they could really use your work.

I had a chat with @hoodmane and there was some interest in using sync-message at the core of synclink, but then nothing happened.

Ultimately IMO it would make sense to use sync-message as a foundation everywhere to abstract away the nitty-gritty technical details of synchronous communication and allow easily switching between COI and service workers. I'm curious as to why @elilambnz and @ojeytonwilliams chose to reimplement this stuff themselves.

For debuggers:

  1. If you've correctly patched stdin to work synchronously, then pdb (the default used by breakpoint()) should just work.
  2. If you're using pyodide-worker-runner combined with python_runner (as you may be doing already to take care of all the sync input stuff) then python_runner makes it easy to use snoop.
  3. birdseye is more complicated, and really has nothing to do with the rest of this thread, but it can be done and you can see futurecoder as an example. It uses a branch of birdseye that removes the server and database so that everything can run with static files in the browser.
nairboon commented 3 months ago

I myself am replacing Skulpt with Pyodide for the following website XLogo. And I really want to accomplish that task in a stable and maintainable way.

@Hephaistos7 are you still working on porting XLogo? Is your work on this public?

Hephaistos7 commented 3 months ago

I myself am replacing Skulpt with Pyodide for the following website XLogo. And I really want to accomplish that task in a stable and maintainable way.

@Hephaistos7 are you still working on porting XLogo? Is your work on this public?

Unfortunately, the work isn't public and my time actively working on it is over. But the port worked and we are using Pyodide successfully, using COI and alexmojaki's libraries: sync-message, pyodide-worker-runner and comsync (as linked in the previous answer)