vmware-labs / webassembly-language-runtimes

Wasm Language Runtimes provides popular language runtimes (Ruby, Python, …) precompiled to WebAssembly that are tested for compatibility and kept up to date when new versions of upstream languages are released
Apache License 2.0
337 stars 29 forks source link

Python.wasm roadmap #46

Open assambar opened 1 year ago

assambar commented 1 year ago

This is a loose list of next steps that we think would improve the python.wasm binary that we build.

Any feedback in comments is greatly appreciated.

brettcannon commented 1 year ago

Three questions and a request.

Question one: do you know about https://discuss.python.org/c/webassembly/28 ? It's where some announcements and discussions happen around Python and WebAssembly.

Question two: do you know about #Python channel on the WebAssembly Discord? It's the other place some of us participate around Python and WebAssembly.

Question three: Is anyone from VMWare looking to come to PyCon US this year? There may be a WebAssembly summit on Thursday, April 20 for a select number of people (space will be limited).

The request: please feel free to make suggestions upstream to improve how WASI is built, packaged, etc.! I want to get WASI to tier 2 support for CPython, and part of that will be creating WASI builds as part of releases. Trying to make sure those work as best as possible for the community would be great, so help would be appreciated. Discussions around that can happen at discuss.python.org or on a GitHub issue at https://github.com/python/cpython .

cohix commented 1 year ago

@assambar this is all excellent. I would love to see documentation/examples of how to do bi-directional function calls between the Wasm guest and the host. Specifically, calling a python function from the outside, and allowing python to call imported host functions other than those provided by WASI.

assambar commented 1 year ago

@brettcannon

Questions 1 and 2 - Did not know about both discussion channels. Subscribed to both. Many thanks! Question 3 - I don't know of anyone going , but I will ask around the Wasm enthusiasts here and let them them know if there's anyone going.

The request: please feel free to make suggestions upstream to improve how WASI is built, packaged, etc.! I want to get WASI to tier 2 support for CPython, and part of that will be creating WASI builds as part of releases. Trying to make sure those work as best as possible for the community would be great, so help would be appreciated. Discussions around that can happen at discuss.python.org or on a GitHub issue at https://github.com/python/cpython .

Would love to. Suggestions will definitely come up as we see how people use python.wasm and what more they need. One thing is, we are currently working to also streamline building and publishing of reproducible, versioned WASI builds of popular libraries like libuuid, libz, libsqlite, etc. We will add more as need comes. Our idea is to have those as release assets so that it will be easier to build CPython with more modules out of the box. I will let you know when we have something, if you're interested.

assambar commented 1 year ago

@assambar this is all excellent. I would love to see documentation/examples of how to do bi-directional function calls between the Wasm guest and the host. Specifically, calling a python function from the outside, and allowing python to call imported host functions other than those provided by WASI.

Thanks for the feedback @cohix . Added this to the list of things to do in the beginning of this thread. Note that it's not a prioritized list.

brettcannon commented 1 year ago

I will let you know when we have something, if you're interested.

Yes please! My hope is we can get a "fat" binary build for WASI distributed on python.org with as much statically linked as possible so people can take that python.wasm file and run it anywhere they want.

gzurl commented 1 year ago

@brettcannon that feedback is very useful. Indeed, we just released two different flavors for PHP 8.2.0. A standard build with common extensions and a slimmed-down version with minimal size. I think we could follow a similar approach with Python.

Also, I believe a specific build focused on ML including the popular packages (NumPy, Pandas, Scikit, etc.) makes a lot of sense for running inference in the edge.

PS: We are waitlisted for a "charla" in PyCon

codefromthecrypt commented 1 year ago

Is it possible to move some of the blog content into a README alongside the python directories here? While I play devil's advocate sometimes, I'm really interested in what you are doing.

For example, I think you fairly mention in the blog that currently pip isn't usable, which reduces the size of docker images accordingly. Some people are looking at image size alone and missing out that firstly that's not the main goal and also it isn't fair to compare like this. Your blog is far more balanced, yet the points aren't visible in this repo.

https://wasmlabs.dev/articles/python-wasm32-wasi/

So, how about putting some things here? especially the part about pip I think is really important for a README, but I bet folks in the python community have their favorite things to say also.

bivald commented 1 year ago

Numpy would be awesome to have in this :) Let me know if I can test out anything when you get closer

assambar commented 1 year ago

I will let you know when we have something, if you're interested.

Yes please! My hope is we can get a "fat" binary build for WASI distributed on python.org with as much statically linked as possible so people can take that python.wasm file and run it anywhere they want.

Hey @brettcannon we published a "fat" binary with the latest release, relying on wasi-vfs, which turned out to be pretty easy. Take a look at python-3.11.1.wasm and python-3.11.1-wasmedge.wasm in python/3.11.1+20230217-15dfbed

Basically, you need to:

Take a look at scripts/build-helpers/wasi_vfs.sh and how the two functions from there are used in python/v3.11.1/wl-build.sh

brettcannon commented 1 year ago

relying on wasi-vfs

Are you referring to https://github.com/kateinoigakukun/wasi-vfs ? Or are you referring to something else?

  • run wasi-vfs CLI to pack it with the folders you want

So you're trying to use wasi-vfs to ship files with the WASI binary so it's as self-contained as possible (short of the runtime)? I assume this only works with pure Python files and there isn't some magical dlopen() support in there? And the big bonus compared to freezing the code in with the binary is avoiding the compile step and instead relying on the wasi-vfs CLI to do the joining?

assambar commented 1 year ago

Are you referring to https://github.com/kateinoigakukun/wasi-vfs ?

Yep. That one.

So you're trying to use wasi-vfs to ship files with the WASI binary so it's as self-contained as possible (short of the runtime)?

Yep. We just packaged the usr/local/libs folder at / and we got an "all-in-one" python.wasm

I assume this only works with pure Python files and there isn't some magical dlopen() support in there?

Exactly. I'm still looking into how we could do this with modules that have C extensions. Without doing the uncharted dlopen support I'm currently thinking of compiling the extension files those to static wasm libs and then linking them along with libpython into a monolithic python.wasm, then packaging that with wasi-vfs. Naturally, a dlopen approach will be the best option, but I'm not sure how much time it will take to get it right. And it will require a host function.

And the big bonus compared to freezing the code in with the binary is avoiding the compile step and instead relying on the wasi-vfs CLI to do the joining?

I'd rather say the ease of use. It seems people get confused when they have to pre-open the standard library in order to use it.

brettcannon commented 1 year ago

I'm still looking into how we could do this with modules that have C extensions. Without doing the uncharted dlopen support I'm currently thinking of compiling the extension files those to static wasm libs and then linking them along with libpython into a monolithic python.wasm

This sounds similar to something we are starting to explore a little for VS Code: create an embedded interpreter scenario where we use https://docs.python.org/3/c-api/import.html#c.PyImport_AppendInittab to make extension modules act as built-in modules.

I'd rather say the ease of use. It seems people get confused when they have to pre-open the standard library in order to use it.

For some future WASI release on python.org, I've been thinking of freezing the stdlib into the binary and then letting people mount the stdlib if they want it for tracebacks. That way the simple, easy-to-deploy solution is available there, while you and the rest of the community innovate on nicer, fancier WASI solutions.

assambar commented 1 year ago

... create an embedded interpreter scenario where we use https://docs.python.org/3/c-api/import.html#c.PyImport_AppendInittab to make extension modules act as built-in modules.

That's exactly what I plan on doing.

For some future WASI release on python.org, I've been thinking of freezing the stdlib into the binary and then letting people mount the stdlib if they want it for tracebacks.

That would be awesome to have.

brettcannon commented 1 year ago

That's exactly what I plan on doing.

If you get to do the work in the open, do let us know and we can potentially coordinate or at least share notes (we are still just playing around, so no code to share, but it's being done in the open so we can talk about it, etc.).

zifeo commented 1 year ago

Great job for the fat binary, works like a charm!

Add documentation/examples of how to do bi-directional function calls between the Wasm guest and the host.

Is there already some pointers or tests I could use for that? I am especially interested to see how the host (e.g. WasmEdge via Rust) can run the Python guest continuously and trigger specific Python function on "event". Is there already something in that end (instead of relaunch the _start with different arguments)?

brettcannon commented 1 year ago

I am especially interested to see how the host (e.g. WasmEdge via Rust) can run the Python guest continuously and trigger specific Python function on "event".

Depending on how it's compiled, you could use Python's C API to accomplish this. So basically you could embed the Python interpreter and then call it that way from your own code.

zifeo commented 1 year ago

@brettcannon Thanks for the insight, yet not 100% clear on my mind. How would Rust call the Python C API run into the wasm runtime? What is the name/part of C API I should be able to call from Rust?

codefromthecrypt commented 1 year ago

to make a comparison to another language.. in rust and tinygo, you can export functions so that they can be called outside the scope of wasi. Is there a way to export functions in python (or any interpreter)? Otherwise, my guess is users will have to use some sort of busy loop and pass ins and out via stdio or something.

brettcannon commented 1 year ago

@zifeo You can call any part of the C API from Rust via unsafe or using something like PyO3. But I think you're after something more dynamic/external than compiling all of Python into your Rust code.

zifeo commented 1 year ago

@brettcannon I am not looking into integrating python code into a Rust app, rather using WasmEdge and interact with a running Python VM. This means that I would need a way to call function from the host from within the runtime, but I am sure sure which path to go. Shall I use Python ctypes? But to load what in the runtime?

assambar commented 1 year ago

@zifeo we worked with Suborbital on something similar to what you want (I think). It does what @brettcannon suggested.

Take a look at this example where the Python interpreter is wrapped by a Wasm module (written in C) and we have end to end host-to-python and python-to-host functions - https://github.com/vmware-labs/webassembly-language-runtimes/tree/bindings/experiments/se2-bindings. Of course this requires some translation in the Wasm module (which was written in C in this example).

If you want your WASM module to behave like a full-blown Python interpreter on top of functionality like the above, it's just as easy as calling Py_Main or Py_BytesMain in the main method (after initializing your "glue" module for python-to-host calls (called sdk in the above example) - the important line here is PyImport_AppendInittab(SDK_MODULE, &PyInit_SdkModule) which will ensure that sdk is a "builtin" module for the Python interpreter.

zifeo commented 1 year ago

@assambar Awesome thanks, exactly what I was looking for. I am close to have reproduced this example with a Rust wrapper. I encountered 2 issues so far:

Happy also to move the discussion elsewhere if you feel it does not belong here.

brettcannon commented 1 year ago

... create an embedded interpreter scenario where we use https://docs.python.org/3/c-api/import.html#c.PyImport_AppendInittab to make extension modules act as built-in modules.

That's exactly what I plan on doing.

It turns out that @kesmit13 has already tested this and got it working with an example extension for Singlestore Labs' WASI build of CPython! He's currently trying to get NumPy to work but running into a circular import issue.

assambar commented 1 year ago
  • using Rust as host, it seems difficult to patch for the sock_accept export. I understand this is a compatibility issue between WasmEdge and the WASI standard that later came, but I am unsure what steps are needed (or Github issue I should track/open) for that to be solved?

This is the WasmEdge issue - https://github.com/WasmEdge/WasmEdge/issues/2056. Before they address it, here's what you can quickly do to make your code run on WasmEdge (however, sock_accept will be broken) - patch_wasmedge_wat_sock_accept.sh . Just note, that this works only on optimized binaries (given the fast and ugly approach we took for the script). Alternatively - you could just try using Wasmtime as a host. It has sock_accept already so you don't need to provide it as host function.

  • using V8 as host, it seems that the stack call size has to be significantly increased to pass the Python initialization. Is this known/expect or shall it be tracked somewhere?

I have not explored this. Maybe just log a separate issue (at best with how you run V8 and also what you import in python). Please note that there's already a know issue with the static libpython, which we built for the example - https://github.com/vmware-labs/webassembly-language-runtimes/issues/79 and I am first looking at it with priority.

assambar commented 1 year ago

It turns out that @kesmit13 has already tested this and got it working with an example extension for Singlestore Labs' WASI build of CPython! He's currently trying to get NumPy to work but running into a circular import issue.

This is great! I'm following your discussion since last week, but don't have the cycles to join in debugging yet.

zifeo commented 1 year ago

@assambar Thanks for the answer. I managed to get a wrapper in Rust working, will release soon a repo for a full example!

Regarding the network, I have tried importing requests but all attempts using network failed so far (even IP only).

brettcannon commented 1 year ago

Regarding the network, I have tried importing requests but all attempts using network failed so far (even IP only).

Outbound networking out isn't supported in WASI preview 1, so that will very likely come down to whether your WASI runtime has support for outbound networking.

zifeo commented 1 year ago

@brettcannon @assambar Here is the example: https://github.com/metatypedev/python-wasi-reactor. I managed to make everything work thanks to your advices 🙏. I am now waiting on #71 to add some more tests and experiment further with async using Tokio. Happy to have your feedback.

assambar commented 1 year ago

@zifeo this looks like a good piece of work! I was happy to learn about Metatype #71 is now fixed and you can get an official WasmLabs build of libpython from https://github.com/vmware-labs/webassembly-language-runtimes/releases/tag/python%2F3.11.3%2B20230428-7d1b259. Note that for a full-blown Python application to work you need a decent stack size. The suggested C linker options can be found in lib/wasm32-wasi/pkgconfig/libpython311.pc inside the tarball.

zifeo commented 1 year ago

@assambar Thanks. We are now updating the reactor to support more than uniquely registering lambdas.

With WasmEdge release 0.12.1 it seems that network is finally available. Could you elaborate what is required for pip to work, or at least to have a way to install python vanilla dependencies (would be great to also have a few hint on how to compile native-based lib)?

brettcannon commented 1 year ago

@zifeo the command to have pip install pure Python dependencies only can be found in https://snarky.ca/testing-a-project-using-the-wasi-build-of-cpython-with-pytest/ , but you will probably need to do that outside of WebAssembly (I don't know how WasmEdge has "networking", but my guess is it isn't complete enough for CPython's socket support to work).

As for native dependencies, you will need to compile that into your Python binary as built-in extensions.