Low performance compared to stdweb

sonovice commented 6 years ago

I have implemented the same algorithm (some audio feature extraction) 3 different times:

pure JS (online)
Rust using stdweb (online)
Rust using wasm-bindgen (online)

(see code here)

In terms of performance, the stdweb solution is by far the fastest. I can't get my head around why wasm-bindgen is so much slower in comparison down to the point where it renders the algorithm unusable on my phone.

The only difference in the code--apart form the crate-dependent overhead--is the type of the passed parameter. For stdweb I had to use:

fn process_audio(buf: TypedArray<f32>) -> Vec<f32>```

And for wasm-bindgen it's :

pub fn process_audio(buf: Vec<f32>) -> Vec<f32>

Any ideas on that?

Pauan commented 6 years ago

@sonovice How big are these Vec<f32>?

In stdweb a TypedArray is a reference type, which means that it actually exists entirely in JavaScript: Rust only contains a pointer to it (which is just an i32). So when you call the process_audio function it only needs to pass an i32 to Rust (which is extremely fast and constant time).

But with wasm-bindgen, you're receiving a Vec<f32>, so it needs to do a full copy of the entire Vec, which is a linear operation. So depending on how big the Vec is (and how frequently you call process_audio), it will be a lot slower.

Pauan commented 6 years ago

Oh, nevermind, I just noticed that you call buf.to_vec(), which also does a linear copy. So I'm fresh out of ideas. 😅

alexcrichton commented 6 years ago

Are you sure you used wasm-bindgen with optimizations? The instructions in your README compile with release but use the debug binary

sonovice commented 6 years ago

@alexcrichton You were right, I overlooked that somehow. Now the performance is comparable.

Is there any way (in the wasm-bindgen version) to actually improve performance even more? For me it seams that a lot of initialization (loading the same constant filterbank, creating the FFT processor) is done over and over again at every call to the function. Plus a lot of memory is copied as far as I can see.

alexcrichton commented 6 years ago

@sonovice certainly! Instead of something like:

#[wasm_bindgen]
pub fn foo(a: Vec<T>) -> Vec<U> { ... }

you may want to instead do:

#[wasm_bindgen]
pub struct Context { ... }

#[wasm_bindgen]
impl Context {
    pub fn new() -> Context { ... }

    pub fn foo(&self, a: Vec<T>) -> Vec<U> { ... }
}

and that way you can share state between all invocations on Context

Copying memory is certainly also very important to consider with wasm performance! The main way to solve this is to ensure that everything stays in wasm's linear memory the entire time. For example you could avoid returning a Vec<u32> by doing something like:

impl Context {
    pub fn foo(&self, a: Vec<T>) /* no return value here */ { ... }

    pub fn data_ptr(&self) -> *const U { ... }
    pub fn data_len(&self) -> usize { ... }
}

And that means that you're always returning views into wasm memory although JS will have to slice it on its side. Ideally we'd do something like:

impl Context {
    pub fn data(&self) -> &[U] { ... }
}

but unfotunately that's not implemented.

Hope that helps!

sonovice commented 6 years ago

@alexcrichton Thank you so much for the input! I am still having a hard time to use it from the JS side. What would I have to change to make use of the Context struct?

Current invocation in JS (not working):

const {Context} = wasm_bindgen;
wasm_bindgen('./drift_meter_bg.wasm').then(function (driftMeter) {
    let foo = new Context();
    foo.process_audio(...);
});

Link to current Rust code

alexcrichton commented 6 years ago

@sonovice once the wasm is loaded you'd do something like:

let x = wasm_bindgen.Context.new();

and then later on you'd call:

const result = x.process(&data);

and I think that'd work?

alexcrichton commented 6 years ago

@sonovice as an example here's a whitespace-ignoring diff of what I applied locally which avoids recreating RFft1D and avoids moving the results array in and out of wasm memory

sonovice commented 6 years ago

@alexcrichton Wow, that answer just highly exceeded my hopes. Thanks for doing all the heavy lifting. CPU usage dropped by almost 30%!

qm3ster commented 6 years ago

Sorry for necroposting, but this is one of the few threads I found on bindgen vs stdweb perf. In your case, is the TypedArray staying inside JS? Isn't there a way to transfer it into the wasm program all at once using Memory or something?

alexcrichton commented 6 years ago

@qm3ster the wasm_bindgen::memory function can be used to get a reference to wasm's own memory which can then go through various typed array bindigns to get subslices and such.

rustwasm / wasm-bindgen

Low performance compared to stdweb #391