Closed sonovice closed 6 years ago
@sonovice How big are these Vec<f32>
?
In stdweb a TypedArray
is a reference type, which means that it actually exists entirely in JavaScript: Rust only contains a pointer to it (which is just an i32
). So when you call the process_audio
function it only needs to pass an i32
to Rust (which is extremely fast and constant time).
But with wasm-bindgen, you're receiving a Vec<f32>
, so it needs to do a full copy of the entire Vec
, which is a linear operation. So depending on how big the Vec
is (and how frequently you call process_audio
), it will be a lot slower.
Oh, nevermind, I just noticed that you call buf.to_vec()
, which also does a linear copy. So I'm fresh out of ideas. 😅
Are you sure you used wasm-bindgen with optimizations? The instructions in your README compile with release but use the debug binary
@alexcrichton You were right, I overlooked that somehow. Now the performance is comparable.
Is there any way (in the wasm-bindgen version) to actually improve performance even more? For me it seams that a lot of initialization (loading the same constant filterbank, creating the FFT processor) is done over and over again at every call to the function. Plus a lot of memory is copied as far as I can see.
@sonovice certainly! Instead of something like:
#[wasm_bindgen]
pub fn foo(a: Vec<T>) -> Vec<U> { ... }
you may want to instead do:
#[wasm_bindgen]
pub struct Context { ... }
#[wasm_bindgen]
impl Context {
pub fn new() -> Context { ... }
pub fn foo(&self, a: Vec<T>) -> Vec<U> { ... }
}
and that way you can share state between all invocations on Context
Copying memory is certainly also very important to consider with wasm performance! The main way to solve this is to ensure that everything stays in wasm's linear memory the entire time. For example you could avoid returning a Vec<u32>
by doing something like:
impl Context {
pub fn foo(&self, a: Vec<T>) /* no return value here */ { ... }
pub fn data_ptr(&self) -> *const U { ... }
pub fn data_len(&self) -> usize { ... }
}
And that means that you're always returning views into wasm memory although JS will have to slice it on its side. Ideally we'd do something like:
impl Context {
pub fn data(&self) -> &[U] { ... }
}
but unfotunately that's not implemented.
Hope that helps!
@alexcrichton Thank you so much for the input! I am still having a hard time to use it from the JS side. What would I have to change to make use of the Context struct?
Current invocation in JS (not working):
const {Context} = wasm_bindgen;
wasm_bindgen('./drift_meter_bg.wasm').then(function (driftMeter) {
let foo = new Context();
foo.process_audio(...);
});
@sonovice once the wasm is loaded you'd do something like:
let x = wasm_bindgen.Context.new();
and then later on you'd call:
const result = x.process(&data);
and I think that'd work?
@sonovice as an example here's a whitespace-ignoring diff of what I applied locally which avoids recreating RFft1D
and avoids moving the results
array in and out of wasm memory
@alexcrichton Wow, that answer just highly exceeded my hopes. Thanks for doing all the heavy lifting. CPU usage dropped by almost 30%!
Sorry for necroposting, but this is one of the few threads I found on bindgen vs stdweb perf.
In your case, is the TypedArray
staying inside JS? Isn't there a way to transfer it into the wasm program all at once using Memory
or something?
@qm3ster the wasm_bindgen::memory
function can be used to get a reference to wasm's own memory which can then go through various typed array bindigns to get subslices and such.
I have implemented the same algorithm (some audio feature extraction) 3 different times:
(see code here)
In terms of performance, the stdweb solution is by far the fastest. I can't get my head around why wasm-bindgen is so much slower in comparison down to the point where it renders the algorithm unusable on my phone.
The only difference in the code--apart form the crate-dependent overhead--is the type of the passed parameter. For stdweb I had to use:
And for wasm-bindgen it's :
Any ideas on that?