Make scala-wasm suitable for standalone Wasm VMs

tanishiking commented 4 months ago

Why

Currently, the Wasm modules generated by scala-wasm depend on helper functions in JavaScript, making it possible to run them on JavaScript engines like V8, but not on standalone Wasm runtimes such as wasmtime or wasmedge.

While WebAssembly was initially designed to run in web browsers, it has recently been adopted in various environments due to its portability, fast loading and execution times, and security benefits.

For example, WebAssembly is being used in:

Plugins/Extensions (Envoy, UDF in TiDB)
Serverless/FaaS (Fastly, Fermyon Spin, Cloudflare workers)
Containers/k8s (runwasi, containerd-shims)
Cloud Apps (wasm-cloud, Wasm Worker Server)
Smart Contracts (Ethereum Wasm, Near Protocol)

(I personally believe the significant benefit of compiling Scala to WebAssembly lies in the ability to run it in these environments (as well as the Component Model))

However, to leverage WebAssembly in these environments, the binaries need to be executable on standalone Wasm runtimes, which is not possible with the current state of the generated binaries.

How

To achieve this, two major tasks are required (and the latter will likely be a significant challenge):

(1) Support WASI (WebAssembly System Interface) (at least WASI preview 1 for now). This would enable access to the file system and other resources, regardless of the host environment.
- For more information on WASI, please refer to the short blog post I wrote on gist
(2) Remove dependencies on JavaScript. (if it's targeted to pure-WASI)
- Implementations of strings, closures, and other features currently rely on JavaScript, but these would need to be implemented in pure WebAssembly.
- Math functions and RegExp implementations would also need to be handled on the WebAssembly side. Could the implementations from Scala Native be reused somehow? (It seems that the Kotlin/Wasm RegExp engine reuses the implementation from Kotlin/Native.)
- Any JavaScript dependencies, including js.Dynamic, would become unavailable (some of them might be replaceable with WASI?)
- Third-party libraries that depend on JavaScript might need to be re-implemented for standalone WebAssembly targets. (It should be considered whether to have separate builds for standalone Wasm or introduce annotations to switch implementations at link time, similar to Scala Native.)

State of WasmGC support on standalone Wasm runtimes

✅ wasmedge 0.14.0
🚧 wasmtime
🚧 wamr
🤔 wasmer

tanishiking commented 3 months ago

WASI exchange data via Wasm linear memory

First of all, WASI (both preview 1 and preview 2) rexchange data between the Wasm module and the host environment via Wasm linear memory. For example, the fd_write function requires us to place some data on linear memory and pass the addresses and length as arguments. That means, we would need to allocate memory on WebAssembly linear memory, and put/get data on those allocated memory regions.

Direct ByteBuffer to allocate memory?

We've talked a bit about using (Direct)ByteBuffer to allocate off-heap memory on WebAssembly linear memory. Emulating a Direct ByteBuffer in pure WebAssembly would involve using WebAssembly's linear memory, which seems reasonable.

However, it relies on a JVM's garbage collector to handle memory deallocation, and I don't think there's a way to hook into when objects are garbage collected by the WasmGC and free memory in a similar way.

We would need to find a workaround to support Direct ByteBuffer in pure WebAssembly at some point if we want to support it, but at this point, Direct Bytebuffer seems too much for supporting WASI 🤔

Our own Memory Allocator (for wasm linear memory)

Instead of supporting Direct ByteBuffer for linear memory allocation, I'm thinking about having a simple linear memory allocator (we anyway need an allocator for Direct ByteBuffer though), something like:

withScopedMemoryAllocator { allocator =>
  val ptr = allocator.alloc(byteSize)
  ptr.writeBytes(bytes)
  ...
} // when we exit the scope, the allocated memory will be "free"ed.

We can start with a simple memory allocation algorithm considering the those allocated memory should be short-lived only for exchanging data between wasm module and host environment (in WASI context).

tarsa commented 3 months ago

note that java 22 adds support for explicitly deallocated foreign (i.e. off-heap) memory: https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/foreign/package-summary.html . you could partially implement that api, throwing exceptions when user wants implicitly deallocated memory. otoh, that api is maybe too fresh and complex to replicate is straight away and your simple api would be best for purposes of writing glue code.

tanishiking commented 3 months ago

Thanks @tarsa I didn't know about that new API, and it gave me an inspiration!

tarsa commented 3 months ago

note that graalvm guys are working on [GR-18218] Full support for the Foreign Function & Memory API in Native Image (oracle/graal issue#8113). after some iterations, their design probably could be a good inspiration to implement that api in scala-wasm and/or scala-native.

edit: i've edited the reference to be unclickable, to prevent looking at gpl-licensed code.

(added later) p.s. probably my wording was misleading. by their design i've meant the abstract mechanisms (ideas) and their configuration, not the graalvm-specific gpl-encumbered implementation.

sjrd commented 3 months ago

We cannot look at OpenJDK nor Graal nor any other GPL code. All we can look at is the public JavaDoc.

tarsa commented 3 months ago

We cannot look at OpenJDK nor Graal nor any other GPL code. All we can look at is the public JavaDoc.

ok, i've edited the link to be unclickable.

the native-image configuration mechanisms are probably descibed not only in javadocs, but in manuals (and other documentation) too. i guess you should be allowed to look at graalvm manuals without any license-related problems.

tanishiking / scala-wasm