rustwasm / wasm-bindgen

Facilitating high-level interactions between Wasm modules and JavaScript
https://rustwasm.github.io/docs/wasm-bindgen/
Apache License 2.0
7.81k stars 1.08k forks source link

Add an unsafe method `Uint8Array::view_mut_raw(ptr: *mut u8, length: usize)` so that JS can efficiently initialize a wasm buffer #1643

Open mstange opened 5 years ago

mstange commented 5 years ago

Motivation

I am using the following pattern to efficiently initialize a large buffer inside wasm memory from JavaScript without copies (see #1079 for full motivation):

#[wasm_bindgen]
pub struct WasmMemBuffer {
    buffer: Vec<u8>,
}

#[wasm_bindgen]
impl WasmMemBuffer {
    #[wasm_bindgen(constructor)]
    pub fn new(byte_length: u32, f: &js_sys::Function) -> Self {
        let buffer = vec![0; byte_length as usize];
        unsafe {
            let array = js_sys::Uint8Array::view(&buffer);
            f.call1(&JsValue::NULL, &JsValue::from(array))
                .expect("The callback function should not throw");
        }
        Self { buffer }
    }
}

fn compute_buffer_hash_impl(data: &[u8]) -> u32 { ... }

#[wasm_bindgen]
pub fn compute_buffer_hash(data: &WasmMemBuffer) -> u32 {
    compute_buffer_hash_impl(&data.buffer)
}
let buffer = new WasmMemBuffer(100000, array => {
  // "array" wraps a piece of wasm memory. Fill it with some values.
  for (let i = 0; i < array.length; i++) {
    array[i] = Math.floor(Math.random() * 256);
  }
});
let hash = compute_buffer_hash(buffer); // No buffer copy when passing this argument. Yay!
buffer.free();
console.log(hash);

There are two problems with this:

  1. There's a mutation here that the rust compiler doesn't know about. Uint8Array::view takes an immutable slice ref, but then I go ahead and mutate the contents. Lying to the compiler is dangerous.
  2. The vec is zero-initialized before its contents are overwritten, which wastes almost 400ms in my use case: https://perfht.ml/2K0rMKw

If I wanted to eliminate the zero initialization, I could do that using reserve + set_len on the vec. However, rust forbids wrapping uninitialized values in a slice - you have to use raw pointers when initializing uninitialized memory. But the current signature of Uint8Array::view requires me to make a slice.

Proposed Solution

I propose adding a new method to Uint8Array (and maybe to the other typed array variants):

pub unsafe fn view_mut_raw(ptr: *mut u8, length: usize)

This would let me change my rust code to the following:

#[wasm_bindgen]
pub struct WasmMemBuffer {
    buffer: Vec<u8>,
}

#[wasm_bindgen]
impl WasmMemBuffer {
    #[wasm_bindgen(constructor)]
    pub fn new(byte_length: u32, f: &js_sys::Function) -> Self {
        let mut buffer: Vec<u8> = Vec::new();
        buffer.reserve(byte_length as usize);
        unsafe {
            let array =
                js_sys::Uint8Array::view_mut_raw(buffer.as_mut_ptr(),
                                                 byte_length as usize);
            f.call1(&JsValue::NULL, &JsValue::from(array))
                .expect("The callback function should not throw");
            buffer.set_len(byte_length as usize);
        }
        Self { buffer }
    }
}

It solves the problems: We've eliminated the zero initialization, we're not constructing a slice around uninitialized memory, and we're not lying to the compiler about the mutation anymore.

Alternatives

I'm not aware of any obvious alternatives.

alexcrichton commented 5 years ago

Seems reasonable to me to add! I'm not 100% convinced we need this for mutability reasons, it seems a long stretch for anything bad to happen. Having this for unininitialized memory though is more compelling to me and seems pretty reasonable!

That3Percent commented 5 years ago

In the same vein, I'd like to see a version of copy_to for typed arrays that can take a &mut MaybeUninit<[T]>. Right now the only way to access the memory in a typed array as a slice is through the copy_to function, but it doesn't make sense to have to initialize the memory first. Should that be opened as a separate issue?

That3Percent commented 5 years ago

Oops, I think I meant &mut [MaybeUninit<T>], sorry.

lshlyapnikov commented 5 years ago

view_mut_raw has been merged. Do we want the copy_to addressed as part of the same issue?

alexcrichton commented 5 years ago

Seems reasonable to me to add as well!

lshlyapnikov commented 5 years ago

but it doesn't make sense to have to initialize the memory first. Should that be opened as a separate issue?

@That3Percent Do you mean you want to be able to pass the uninitialized MaybeUninit to the copy_to_xxx and let it deal with the allocation?

lshlyapnikov commented 5 years ago

is there any way to check if MaybeUninit has been initialized?

That3Percent commented 5 years ago

@ibaryshnikov

Do you mean you want to be able to pass the uninitialized MaybeUninit to the copy_to_xxx and let it deal with the allocation?

MaybeUninit deals with memory that is already allocated, but not yet initialized. So, copy_to_xxx would not have to deal with any allocation. It would merely fulfill the contract of initializing the memory by writing valid data to it.

is there any way to check if MaybeUninit has been initialized? No there is not, but for the purposes of this task that would not be necessary because it would be the responsibility of copy_to_xxx to initialize the memory by copying the buffer.

Here's the workflow as it exists today and how it could be improved by the issue.

Task: Given some TypedArray owned by the JavaScript runtime, make it's contents readable as a &[T]

Today:

  1. Allocate some buffer with the same length as the TypedArray. This can be done with eg: Vec::.with_capacity or similar
  2. Initialize the memory by writing something to it. Usually by zeroing. (Steps 1 and 2 can be done in a single call, but are still separate steps as far as the program is concerned)
  3. Pass the initialized memory to copy_to_xxx, which overwrites all the zeroes that were just written

Step 2 is what we want to get rid of. Why write 0 when it's going to be overridden immediately? But, it's not possible without invoking Undefined Behavior, because it's not valid to even have a &[u32] over uninitialized memory, even if unused.

Proposed:

  1. Allocate some buffer with the same length as the TypedArray. This can be done with eg: second-stack, RawVec or similar.
  2. Using MaybeUninit, pass the allocated but uninitialized slice to copy_to_xxx, which initializes the memory by virtue of writing to it on the other side of the FFI boundary.
That3Percent commented 5 years ago

@lshlyapnikov

I just took a look at #1855 . I think that what you built there is a decent step toward a more performant version of the existing to_vec method. If you look at what that method does today, it executes steps 1-3 that I outlined - allocate, initialize, overwrite. Whereas your code does the more performant version. The API for copy_to_maybe_uninit does not make a great deal of sense though. (eg: why take an MaybeUninit at all if the method internally allocates a Vec?). I think you should refactor the code you have into to_vec so that everyone can enjoy the performance gains, without the usage of MaybeUninit at all. The copy_to_maybe_uninit method should be modified to accept a (I think) &mut [MaybeUninit<T>] to move control of the allocation/destination to the caller.

lshlyapnikov commented 4 years ago

@That3Percent does this match your use case? https://github.com/rustwasm/wasm-bindgen/pull/1855/files#diff-a4985f313137ee0bff2488dce289c41cR238-R251

MaybeUninit is pointing to an uninitialized Vec. sut is copy_to_unsafe

It is not exactly what you asked for, the signature of the function is:

pub unsafe fn copy_to_unsafe(&self, dst: &mut mem::MaybeUninit<&mut [$ty]>) {
lshlyapnikov commented 4 years ago

@That3Percent I think you can use the existing copy_to with uninitialized memory. Why do you want another method? The below works with the existing js_sys::Uint8Array::copy_to and uninitialized slice:

js_sys::Uint8Array::copy_to(&js_arr, *(mvec.as_mut_ptr()));`

Do I miss anything? Here is the entire test case:

    let expected: Vec<u8> = (0..10).collect();
    let js_arr = js_sys::Uint8Array::from(expected.as_slice());

    let mut vec: Vec<u8> = Vec::with_capacity(expected.len());
    unsafe {
        vec.set_len(expected.len());
    }
    let mut mvec: mem::MaybeUninit<&mut [u8]> = mem::MaybeUninit::new(&mut vec);

    unsafe {
        js_sys::Uint8Array::copy_to(&js_arr, *(mvec.as_mut_ptr()));
        assert_eq!(*mvec.as_ptr(), expected.as_slice());
        assert_eq!(mvec.assume_init(), expected.as_slice());
    }
    assert_eq!(vec, expected);
lshlyapnikov commented 4 years ago

Can anyone review the copy_to_mem_raw https://github.com/rustwasm/wasm-bindgen/pull/1855#issuecomment-623852925

adrian17 commented 2 years ago

However, rust forbids wrapping uninitialized values in a slice - you have to use raw pointers when initializing uninitialized memory. But the current signature of Uint8Array::view requires me to make a slice.

This touches more APIs, for example: glReadPixels's point is to fill a possibly uninitialized buffer, but given the wasm-bindgen WebGL signature:

    pub fn read_pixels_with_opt_u8_array(
        this: &WebGlRenderingContext,
        (...)
        pixels: Option<&mut [u8]>,
    ) -> Result<(), JsValue>;

Currently there's no (legal and sound from Rust's POV) way of avoiding zero-initialization without changing this API.