rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.26k stars 12.71k forks source link

Support for WebAssembly externref in non-web environment #103516

Open kwerner8 opened 2 years ago

kwerner8 commented 2 years ago

My goal is to compile a function written in Rust to WebAssembly which has as an input/output type externref. Then I want to call this function as an export in Wasmtime. So, this function will receive an externref as an input and also return one.

The WebAssembly Code in the wat format should look similar to this:

(func (export "func") (param externref) (result externref)
... WebAssembly Code ... )

I have seen that there exists wasm-bindgen for browser hosts.

Is there a way to produce externref as an input and output for exports of functions written in Rust that can be called in non-web environments like Wasmtime?

CryZe commented 2 years ago

I believe this is essentially blocked by LLVM support: https://reviews.llvm.org/D122215

kwerner8 commented 2 years ago

Okay. Thank you very much! Are there plans to enable this feature in the future?

misalcedo commented 1 year ago

@CryZe now that the changes you linked are merged, is there any plans to add this to Rust?

jyn514 commented 1 year ago

I am not familiar with WASM or externref. What does this feature do, and what support is needed from Rust to enable it? Will it need language changes or a new extern "foo" ABI?

CryZe commented 1 year ago

An externref is an opaque object that the WebAssembly engine can pass into wasm functions from the outside. So in the case of a browser those are actual JavaScript objects. WebAssembly can then pass them around and store them in a dedicated table for them and later retrieve them from the table to then call for example an external function that takes those as arguments (so that could be a DOM function that operates on JavaScript objects). Importantly externrefs must stay fully opaque, so it's impossible to write them into linear memory as otherwise you could observe their bytes and possibly modify those. Also you wouldn't know how big they are anyway. One would have to check how LLVM treats them and design a language concept around that.

TethysSvensson commented 1 year ago

Yes, this will probably require changing the ABI, but I don't think it will require adding a new one.

A bit of context

The reference types proposal was accepted into the wasm spec in Februrary 2021.

LLVM support was added later in 2021. Clang also seems to mostly have support at this point.

The reference types proposal adds a few new things. The most relevant for this discussion is:

An externref is a completely opaque pointer, which is managed by the host. It cannot be used for anything except being passed around or given to a host function.

A funcref is a pointer to a function created either by the host or by wasm itself. It can be called from wasm.

See this Clang RFC for a more in-depth description of how LLVM implements this proposal.

Rust implementation

Before this proposal, there was only a single global table, which contained all of the function pointers. It was not possible to extract a function pointer from the table into a stack or global value.

This had a direct effect on the ABI. Because you could not represent function pointers directly, and because there was only a single table, function pointers were represented as indices into this table.

As an example this code

fn add(left: usize, right: usize) -> usize {
    left + right
}

#[no_mangle]
pub fn add_ptr() -> fn(usize, usize) -> usize {
    add
}

currently compiles to this wasm module:

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (type (;1;) (func (result i32)))
  (func $_ZN3foo3add17hc001cc2609bca236E (type 0) (param i32 i32) (result i32)
    local.get 1
    local.get 0
    i32.add)
  (func $add_ptr (type 1) (result i32)
    i32.const 1)
  (table (;0;) 2 2 funcref)
  (memory (;0;) 16)
  (global $__stack_pointer (mut i32) (i32.const 1048576))
  (global (;1;) i32 (i32.const 1048576))
  (global (;2;) i32 (i32.const 1048576))
  (export "memory" (memory 0))
  (export "add_ptr" (func $add_ptr))
  (export "__data_end" (global 1))
  (export "__heap_base" (global 2))
  (elem (;0;) (i32.const 1) func $_ZN3foo3add17hc001cc2609bca236E))

Specifically, the pointer to the add function is put at the location 1 in the table, and add_ptr returns this index.

In my opinion, this should ideally be changed. With the reference types proposal, it would be more appropriate to instead return a funcref value.

jyn514 commented 1 year ago

Ok. It sounds like this is purely an implementation concern - we can change the IR we send to LLVM and it will result in a speedup, without changes necessary to user code?

CryZe commented 1 year ago

without changes necessary to user code?

No, it definitely is a language concern as an externref is a totally new kind of type with all sorts of corner cases that need to be resolved:

void foo(__externref_t x) {
    &x;
    struct { __externref_t y; } z = { .y = x };
}
<source>:2:5: error: cannot take address of WebAssembly reference
    &x;
    ^~
<source>:3:28: error: field has sizeless type '__externref_t'
    struct { __externref_t y; } z = { .y = x };
                           ^

Compiler Explorer

So it's a !Sized + !Ref type (the latter concept doesn't even exist, although not even sure if !Sized is the correct concept, as you can store it in variables directly and assign it just fine)

jyn514 commented 1 year ago

This probably needs some sort of RFC in that case.

TimNN commented 1 year ago

I have written a pre-RFC for WebAssembly Heap Type Support in Rust: https://internals.rust-lang.org/t/pre-rfc-webassembly-heap-types/19774

adetaylor commented 1 year ago

Hi everyone, we just proposed an RFC for arbitrary self types. The primary motivation is because for C++, Python and other interop we need to have smart pointer types which do not obey normal Rust semantics.

This doesn't help with the fundamental LLVM-based opaque thingy that you need for __externref_t but it might well help you wrap such things up into first-class smart pointer objects. Or, it might provide a nicer interim solution for wasm-bindgen:

pub struct WasmExternRef<T> {
   index_into_wasm_bindgens_big_table: usize,
   _phantom: std::marker::PhantomData<T>
}

impl std::ops::Receiver for WasmExternRef<T> {.   // the new bit as enabled by the RFC linked above
   type Target = T;
}

// Generated by wasm-bindgen
struct JSApi;
impl JSApi {
  fn js_method(self: WasmExternRef<T>) {   // note the weird self type here
    // wasm-bindgen calls to javascript by accessing the index from the big table
  }
}

fn main() {
  let my_js_ref: WasmExternRef<JSApi> = // obtain somehow
  my_js_ref.js_method();  // new bit enabled by RFC
}

I'm not sure how much of this is already possible with wasm-bindgen as I've never used it. And it's probably not as good as first-class __externref_t LLVM support. But maybe it enables wasm-bindgen-generated bindings to be more ergonomic.

Thanks to pachi for spotting the possible link.