rustwasm / wasm-bindgen

Facilitating high-level interactions between Wasm modules and JavaScript
https://rustwasm.github.io/docs/wasm-bindgen/
Apache License 2.0
7.47k stars 1.03k forks source link

Bench `wasm-bindgen-futures` against `spawn64` #3957

Closed Bajix closed 1 month ago

Bajix commented 2 months ago

I authored spawn64 as an optimized alternative to wasm_bindgen_futures::spawn_local. If there are ever added benchmarks for wasm_bindgen_futures::spawn_local it would be nice if this could be compared against spawn64

63b8e7lmk1jv9c0ovus4hqkcasqgp0ed commented 2 months ago

It's performing worse for me.

fn async_generator() -> impl Future<Output = ()> {
    async { async_std::task::sleep(Duration::from_secs(1)).await }
}

fn simple_bench<O: Future<Output = ()>, K: Fn() -> O, F: Fn(O)>(
    bench_fn: F,
    generator: K,
) -> chrono::TimeDelta {
    let start = Utc::now();

    for _ in 0..200_000 {
        bench_fn(generator())
    }

    Utc::now() - start
}

fn delta_format(v: TimeDelta) -> String {
    format!("{}.{}s", v.num_seconds(), v.num_milliseconds() % 1000)
}

#[wasm_bindgen]
pub async fn example() {
    let r2 = simple_bench(spawn64::spawn_local, async_generator);
    let r1 = simple_bench(wasm_bindgen_futures::spawn_local, async_generator);

    web_sys::console::log_1(
        &format!(
            "wasm_bindgen_futures: {}, spawn64: {}",
            delta_format(r1),
            delta_format(r2)
        )
        .into(),
    );
}

Output:

wasm_bindgen_futures: 0.18s, spawn64: 3.401s
Bajix commented 2 months ago

Benching how long it takes to enqueue 200k futures doesn't make sense because it omits differences in time to poll and because in a browser environment there aren't going to be many futures. Spawn64 has the advantage that tasks don't need to be wrapped in RC and for workloads fewer than 64 there aren't any non-task heap allocations. The reason why it's bad at this specific benchmark is because for each task, it checks N >> 6 linked slabs for free slots where N is the number of previously enqueued tasks, however I have a free list optimization WIP to make this lookup constant time.

63b8e7lmk1jv9c0ovus4hqkcasqgp0ed commented 2 months ago

If the async functions are not doing very much, it's the difference of less than 300ms for 200,000 tests. In comparison, the delay from the start is at least 3s worse, with the last call to sleep being delayed to 4s. (The time to start is important when waiting on sleeping tasks)

fn async_send_on_complete_gen() -> (impl Future<Output = ()>, Receiver<i32>) {
    let (s, after_test_receiver) = async_channel::unbounded();
    let test_future = async move {
        let result = 0;

        let _ = s.send(result).await;
    };

    (test_future, after_test_receiver)
}

fn create_summary(times: &Vec<TimeDelta>) -> String {
    // times.sort();
    let mut times = times.clone();
    times.sort();

    let micro_times: Vec<i64> = times
        .iter()
        .map(|v| v.num_microseconds().unwrap())
        .collect();

    let median = micro_times[micro_times.len() / 2];

    let avg: f64 = (micro_times.iter().sum::<i64>() as f64) / (micro_times.len() as f64);
    let avg = (avg * 100f64).floor() / 100.0;

    let test_count = micro_times.len();

    let total_time: f64 = (micro_times.iter().sum::<i64>() as f64 / 1000.0) / 1000.0;

    format!(
        "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}",
        micro_times[0],
        micro_times[micro_times.len() - 1]
    )
}

async fn simple_bench_2<
    T,
    LOCALSPAWN: Future<Output = ()>,
    GENERATOR: Fn() -> (LOCALSPAWN, Receiver<T>),
    BENCH: Fn(LOCALSPAWN),
>(
    bench_fn: BENCH,
    test_generator: GENERATOR,
) -> Vec<TimeDelta> {
    let mut times = vec![];

    for _ in 0..200_000 {
        let start = Utc::now();
        let (fut, on_complete) = test_generator();

        bench_fn(fut);
        let _ = on_complete.recv().await;

        times.push(Utc::now() - start)
    }

    times.sort();

    times
}

#[wasm_bindgen]
pub async fn on_complete_bench() {
    let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await;

    let wasm_bindgen = simple_bench_2(
        wasm_bindgen_futures::spawn_local,
        async_send_on_complete_gen,
    )
    .await;

    web_sys::console::log_1(
        &format!(
            "wasm_bindgen_futures: {:#?} ",
            create_summary(&wasm_bindgen),
        )
        .into(),
    );
    web_sys::console::log_1(
        &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
    );

    web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into());
    async_std::task::sleep(Duration::from_secs(10)).await;

    let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await;

    web_sys::console::log_1(
        &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
    );

    web_sys::console::log_1(&"Waiting 10 seconds before final test.".into());
    async_std::task::sleep(Duration::from_secs(10)).await;

    let wasm_bindgen = simple_bench_2(
        wasm_bindgen_futures::spawn_local,
        async_send_on_complete_gen,
    )
    .await;

    web_sys::console::log_1(
        &format!(
            "wasm_bindgen_futures: {:#?} ",
            create_summary(&wasm_bindgen),
        )
        .into(),
    );
}

Output:

wasm_bindgen_futures: "best: 0 µs, worst: 16000 µs, avg: 8.21 µs, median: 0 µs, total_time: 1.643 s, test_count: 200000"
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 7.11 µs, median: 0 µs, total_time: 1.423 s, test_count: 200000" 
Waiting 10 seconds then trying again. 
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 7.1 µs, median: 0 µs, total_time: 1.421 s, test_count: 200000" 
Waiting 10 seconds before final test. 
wasm_bindgen_futures: "best: 0 µs, worst: 6000 µs, avg: 7.9 µs, median: 0 µs, total_time: 1.581 s, test_count: 200000" 

There were the same amount of tests.

Spawn Function Test 1: Time to start Test 2: Time to complete Total
spawn64 3.401s 1.421 4.822s
wasm_bindgen_futures 0.18s 1.581s 1.761
Results 🔴 + 3.221s 🟢 - 0.16s 🔴3.061
63b8e7lmk1jv9c0ovus4hqkcasqgp0ed commented 2 months ago

I also tested a workloads of size 63

fn async_send_on_complete_gen() -> (impl Future<Output = ()>, Receiver<i32>) {
    let (s, after_test_receiver) = async_channel::unbounded();
    let test_future = async move {
        let result = 0;

        let _ = s.send(result).await;
    };

    (test_future, after_test_receiver)
}

fn create_summary(times: &Vec<TimeDelta>) -> String {
    // times.sort();
    let mut times = times.clone();
    times.sort();

    let micro_times: Vec<i64> = times
        .iter()
        .map(|v| v.num_microseconds().unwrap())
        .collect();

    let median = micro_times[micro_times.len() / 2];

    let avg: f64 = (micro_times.iter().sum::<i64>() as f64) / (micro_times.len() as f64);
    let avg = (avg * 100f64).floor() / 100.0;

    let test_count = micro_times.len();

    let total_time: f64 = (micro_times.iter().sum::<i64>() as f64 / 1000.0) / 1000.0;

    format!(
        "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}",
        micro_times[0],
        micro_times[micro_times.len() - 1]
    )
}

async fn simple_bench_3<
    T,
    LOCALSPAWN: Future<Output = ()>,
    GENERATOR: Fn() -> (LOCALSPAWN, Receiver<T>),
    BENCH: Fn(LOCALSPAWN),
>(
    bench_fn: BENCH,
    test_generator: GENERATOR,
) -> Vec<TimeDelta> {
    let mut times = vec![];

    for _ in 0..200_000 {
        let mut work = vec![];

        for _ in 0..63 {
            let (fut, item) = test_generator();
            let fut = Some(fut);

            work.push((fut, item));
        }

        let start = Utc::now();

        for i in 0..63 {
            bench_fn(work[i].0.take().unwrap());
        }

        for i in 0..63 {
            let _ = work[i].1.recv().await;
        }

        times.push(Utc::now() - start)
    }

    times.sort();

    times
}

#[wasm_bindgen]
pub async fn no_queue_work() {
    let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await;

    let wasm_bindgen = simple_bench_3(
        wasm_bindgen_futures::spawn_local,
        async_send_on_complete_gen,
    )
    .await;

    web_sys::console::log_1(
        &format!(
            "wasm_bindgen_futures: {:#?} ",
            create_summary(&wasm_bindgen),
        )
        .into(),
    );
    web_sys::console::log_1(
        &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
    );

    web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into());
    async_std::task::sleep(Duration::from_secs(10)).await;

    let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await;

    web_sys::console::log_1(
        &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
    );

    web_sys::console::log_1(&"Waiting 10 seconds before final test.".into());
    async_std::task::sleep(Duration::from_secs(10)).await;

    let wasm_bindgen = simple_bench_3(
        wasm_bindgen_futures::spawn_local,
        async_send_on_complete_gen,
    )
    .await;

    web_sys::console::log_1(
        &format!(
            "wasm_bindgen_futures: {:#?} ",
            create_summary(&wasm_bindgen),
        )
        .into(),
    );
}

Output:

wasm_bindgen_futures: "best: 0 µs, worst: 15000 µs, avg: 13.4 µs, median: 0 µs, total_time: 2.681 s, test_count: 200000" 
spawn64::spawn_local: "best: 0 µs, worst: 8000 µs, avg: 11.28 µs, median: 0 µs, total_time: 2.256 s, test_count: 200000" 
Waiting 10 seconds then trying again. 
spawn64::spawn_local: "best: 0 µs, worst: 8000 µs, avg: 11.92 µs, median: 0 µs, total_time: 2.384 s, test_count: 200000" 
Waiting 10 seconds before final test. 
wasm_bindgen_futures: "best: 0 µs, worst: 8000 µs, avg: 13.19 µs, median: 0 µs, total_time: 2.638 s, test_count: 200000" 
Bajix commented 2 months ago

Ok so this definitely shows that the tagged pointer raw waker approach of Spawn64 outperforms the alternative of using RC wrapped tasks for polling. I've got a really solid algorithm change in mind for supporting a large number of futures which should beat the wasm_bindgen_futures time to start

Bajix commented 1 month ago

0.2.2-beta.0 has an initial free list implementation here: https://github.com/Bajix/spawn64/tree/free-list

63b8e7lmk1jv9c0ovus4hqkcasqgp0ed commented 1 month ago
Test #1 
Tine to start test 
wasm_bindgen_futures: 0.8s, spawn64: 0.10s 
---------------------------------------------------- 
Test #2 
Tine to complete test 
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 8.77 µs, median: 0 µs, total_time: 1.755 s, test_count: 200000" 
spawn64::spawn_local: "best: 0 µs, worst: 766000 µs, avg: 12.3 µs, median: 0 µs, total_time: 2.46 s, test_count: 200000" 
Waiting 10 seconds then trying again. 
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 8.44 µs, median: 0 µs, total_time: 1.689 s, test_count: 200000" 
Waiting 10 seconds before final test. 
wasm_bindgen_futures: "best: 0 µs, worst: 6000 µs, avg: 8.71 µs, median: 0 µs, total_time: 1.743 s, test_count: 200000" 
---------------------------------------------------- 
Test #3 
Workload unjoined 
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 14.52 µs, median: 0 µs, total_time: 2.904 s, test_count: 200000" 
spawn64::spawn_local: "best: 0 µs, worst: 15000 µs, avg: 11.02 µs, median: 0 µs, total_time: 2.205 s, test_count: 200000" 
Waiting 10 seconds then trying again. 
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 11.08 µs, median: 0 µs, total_time: 2.216 s, test_count: 200000" 
Waiting 10 seconds before final test. 
wasm_bindgen_futures: "best: 0 µs, worst: 8000 µs, avg: 14.81 µs, median: 0 µs, total_time: 2.963 s, test_count: 200000" 
---------------------------------------------------- 
Test #4 
Workload joined 
spawn_local: "best: 0 µs, worst: 15000 µs, avg: 12.66 µs, median: 0 µs, total_time: 2.532 s, test_count: 200000" 
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 18.26 µs, median: 0 µs, total_time: 3.652 s, test_count: 200000" 
---------------------------------------------------- 
Test #5 
Large stack allocation 
Waiting 5 seconds to start the test (garbage collector issues) 
wasm_bindgen_futures: "best: 7000 µs, worst: 13000 µs, avg: 7850 µs, median: 8000 µs, total_time: 0.785 s, test_count: 100" 
panicked at src\lib.rs:506:79:
called `Result::unwrap()` on an `Err` value: RecvError

The last test is failing for spawn64. If I remove earlier tests, it just hangs for spawn64 instead of failing. I don't really know what's wrong, it looks like the future is being dropped. I think wasm_bindgen_futures will drop too if memory in the browser becomes too high. The difference is that by lowering the workload, wasm_bindgen_futures can still complete Test 5, but spawn64 just hangs.

Large rust copy and paste mess ```rust mod bench1 { use std::{future::Future, time::Duration}; // use async_channel::Receiver; use chrono::{TimeDelta, Utc}; // use wasm_bindgen::prelude::wasm_bindgen; fn async_generator() -> impl Future { async { async_std::task::sleep(Duration::from_secs(1)).await } } fn simple_bench, K: Fn() -> O, F: Fn(O)>( bench_fn: F, generator: K, ) -> chrono::TimeDelta { let start = Utc::now(); for _ in 0..200_000 { bench_fn(generator()) } Utc::now() - start } fn delta_format(v: TimeDelta) -> String { format!("{}.{}s", v.num_seconds(), v.num_milliseconds() % 1000) } pub async fn time_to_start() { let r2 = simple_bench(spawn64::spawn_local, async_generator); let r1 = simple_bench(wasm_bindgen_futures::spawn_local, async_generator); web_sys::console::log_1( &format!( "wasm_bindgen_futures: {}, spawn64: {}", delta_format(r1), delta_format(r2) ) .into(), ); } } mod bench2 { use async_channel::Receiver; use chrono::{TimeDelta, Utc}; use std::{future::Future, time::Duration}; use wasm_bindgen::prelude::wasm_bindgen; fn async_send_on_complete_gen() -> (impl Future, Receiver) { let (s, after_test_receiver) = async_channel::unbounded(); let test_future = async move { let result = 0; let _ = s.send(result).await; }; (test_future, after_test_receiver) } fn create_summary(times: &Vec) -> String { // times.sort(); let mut times = times.clone(); times.sort(); let micro_times: Vec = times .iter() .map(|v| v.num_microseconds().unwrap()) .collect(); let median = micro_times[micro_times.len() / 2]; let avg: f64 = (micro_times.iter().sum::() as f64) / (micro_times.len() as f64); let avg = (avg * 100f64).floor() / 100.0; let test_count = micro_times.len(); let total_time: f64 = (micro_times.iter().sum::() as f64 / 1000.0) / 1000.0; format!( "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}", micro_times[0], micro_times[micro_times.len() - 1] ) } async fn simple_bench_2< T, LOCALSPAWN: Future, GENERATOR: Fn() -> (LOCALSPAWN, Receiver), BENCH: Fn(LOCALSPAWN), >( bench_fn: BENCH, test_generator: GENERATOR, ) -> Vec { let mut times = vec![]; for _ in 0..200_000 { let start = Utc::now(); let (fut, on_complete) = test_generator(); bench_fn(fut); let _ = on_complete.recv().await; times.push(Utc::now() - start) } times.sort(); times } #[wasm_bindgen] pub async fn time_to_complete() { let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await; let wasm_bindgen = simple_bench_2( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); web_sys::console::log_1( &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(), ); web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into()); async_std::task::sleep(Duration::from_secs(10)).await; let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await; web_sys::console::log_1( &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(), ); web_sys::console::log_1(&"Waiting 10 seconds before final test.".into()); async_std::task::sleep(Duration::from_secs(10)).await; let wasm_bindgen = simple_bench_2( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); } } mod bench3 { use async_channel::Receiver; use chrono::{TimeDelta, Utc}; use std::{future::Future, time::Duration}; use wasm_bindgen::prelude::wasm_bindgen; fn async_send_on_complete_gen() -> (impl Future, Receiver) { let (s, after_test_receiver) = async_channel::unbounded(); let test_future = async move { let result = 0; let _ = s.send(result).await; }; (test_future, after_test_receiver) } fn create_summary(times: &Vec) -> String { // times.sort(); let mut times = times.clone(); times.sort(); let micro_times: Vec = times .iter() .map(|v| v.num_microseconds().unwrap()) .collect(); let median = micro_times[micro_times.len() / 2]; let avg: f64 = (micro_times.iter().sum::() as f64) / (micro_times.len() as f64); let avg = (avg * 100f64).floor() / 100.0; let test_count = micro_times.len(); let total_time: f64 = (micro_times.iter().sum::() as f64 / 1000.0) / 1000.0; format!( "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}", micro_times[0], micro_times[micro_times.len() - 1] ) } async fn simple_bench_3< T, LOCALSPAWN: Future, GENERATOR: Fn() -> (LOCALSPAWN, Receiver), BENCH: Fn(LOCALSPAWN), >( bench_fn: BENCH, test_generator: GENERATOR, ) -> Vec { let mut times = vec![]; for _ in 0..200_000 { let mut work = vec![]; for _ in 0..63 { let (fut, item) = test_generator(); let fut = Some(fut); work.push((fut, item)); } let start = Utc::now(); for i in 0..63 { bench_fn(work[i].0.take().unwrap()); } for i in 0..63 { let _ = work[i].1.recv().await; } times.push(Utc::now() - start) } times.sort(); times } #[wasm_bindgen] pub async fn workload_unjoined() { let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await; let wasm_bindgen = simple_bench_3( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); web_sys::console::log_1( &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(), ); web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into()); async_std::task::sleep(Duration::from_secs(10)).await; let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await; web_sys::console::log_1( &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(), ); web_sys::console::log_1(&"Waiting 10 seconds before final test.".into()); async_std::task::sleep(Duration::from_secs(10)).await; let wasm_bindgen = simple_bench_3( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); } } mod bench4 { use async_channel::Receiver; use chrono::{TimeDelta, Utc}; use std::future::Future; // use wasm_bindgen::prelude::wasm_bindgen; fn async_send_on_complete_gen() -> (impl Future, Receiver) { let (s, after_test_receiver) = async_channel::unbounded(); let test_future = async move { let result = 0; let _ = s.send(result).await; }; (test_future, after_test_receiver) } fn create_summary(times: &Vec) -> String { // times.sort(); let mut times = times.clone(); times.sort(); let micro_times: Vec = times .iter() .map(|v| v.num_microseconds().unwrap()) .collect(); let median = micro_times[micro_times.len() / 2]; let avg: f64 = (micro_times.iter().sum::() as f64) / (micro_times.len() as f64); let avg = (avg * 100f64).floor() / 100.0; let test_count = micro_times.len(); let total_time: f64 = (micro_times.iter().sum::() as f64 / 1000.0) / 1000.0; format!( "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}", micro_times[0], micro_times[micro_times.len() - 1] ) } async fn simple_bench_4< T, LOCALSPAWN: Future, GENERATOR: Fn() -> (LOCALSPAWN, Receiver), BENCH: Fn(LOCALSPAWN), >( bench_fn: BENCH, test_generator: GENERATOR, ) -> Vec { let mut times = vec![]; for _ in 0..200_000 { let mut work = vec![]; let workload_size = 63; for _ in 0..workload_size { let (fut, item) = test_generator(); let fut = Some(fut); let item = Some(item); work.push((fut, item)); } let workload = work .iter_mut() .map(|(_, ref mut receiver)| { let f = receiver.take().unwrap(); async move { let f = f; f.recv().await } }) .collect::>(); let start = Utc::now(); for i in 0..workload_size { bench_fn(work[i].0.take().unwrap()); } let _result = futures::future::join_all(workload).await; times.push(Utc::now() - start) } times.sort(); times } pub async fn workload_joined() { use std::panic; panic::set_hook(Box::new(console_error_panic_hook::hook)); let spawn_local = simple_bench_4(spawn64::spawn_local, async_send_on_complete_gen).await; web_sys::console::log_1( &format!("spawn_local: {:#?} ", create_summary(&spawn_local)).into(), ); let wasm_bindgen = simple_bench_4( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); } } mod bench5 { use async_channel::{unbounded, Receiver, Sender}; use async_std::channel; use chrono::{TimeDelta, Timelike, Utc}; use std::{future::Future, time::Duration}; // use wasm_bindgen::prelude::wasm_bindgen; // use web_sys::console; fn async_send_on_complete_gen() -> ( impl Future, Receiver<([i32; 10_000], Sender<()>)>, ) { let (tell_me_when_over, waiting) = unbounded(); let (s, after_test_receiver) = async_channel::unbounded(); let test_future = async move { let mut result = [10000; 10000]; result[0] = Utc::now().minute() as i32; let _ = s.send((result, tell_me_when_over)).await; waiting.recv().await; }; (test_future, after_test_receiver) } fn create_summary(times: &Vec) -> String { // times.sort(); let mut times = times.clone(); times.sort(); let micro_times: Vec = times .iter() .map(|v| v.num_microseconds().unwrap()) .collect(); let median = micro_times[micro_times.len() / 2]; let avg: f64 = (micro_times.iter().sum::() as f64) / (micro_times.len() as f64); let avg = (avg * 100f64).floor() / 100.0; let test_count = micro_times.len(); let total_time: f64 = (micro_times.iter().sum::() as f64 / 1000.0) / 1000.0; format!( "best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}", micro_times[0], micro_times[micro_times.len() - 1] ) } async fn simple_bench_3< T, LOCALSPAWN: Future, GENERATOR: Fn() -> (LOCALSPAWN, Receiver<(T, Sender<()>)>), BENCH: Fn(LOCALSPAWN), >( bench_fn: BENCH, test_generator: GENERATOR, ) -> Vec { let mut times = vec![]; // let mut counts: Vec = vec![]; // let mut notify = vec![]; // let (s, r) = unbounded(); for _ in 0..100 { let mut store: Vec> = vec![]; let mut work: Vec<(Option, Receiver<(T, Sender<()>)>)> = vec![]; for _ in 0..63 { let (fut, item) = test_generator(); // notify.push(on_complete); let fut = Some(fut); work.push((fut, item)); } let start = Utc::now(); for i in 0..63 { bench_fn(work[i].0.take().unwrap()); } for i in 0..63 { let (_result, notify_when_test_over) = work[i].1.recv().await.unwrap(); store.push(notify_when_test_over); } times.push(Utc::now() - start); let _results = futures::future::join_all( store.into_iter().map(|s| async move { s.send(()).await }), ) .await; /* for notify in store.into_iter() { notify.send(()).await; // for } */ // counts.push(store.remove(0)); // std::mem::swap(&mut counts[0], &mut store[0]); // // counts.push(stdstore[0]); // notify.push(work); } // web_sys::console::log_1(&counts.len().into()); // notify.into_iter().for_each(|open| open.send(()).await); times.sort(); times } pub async fn large_stack_allocation() { web_sys::console::log_1( &"Waiting 5 seconds to start the test (garbage collector issues)".into(), ); let wasm_bindgen = simple_bench_3( wasm_bindgen_futures::spawn_local, async_send_on_complete_gen, ) .await; web_sys::console::log_1( &format!( "wasm_bindgen_futures: {:#?} ", create_summary(&wasm_bindgen), ) .into(), ); let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await; web_sys::console::log_1( &format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(), ); } } use wasm_bindgen::prelude::wasm_bindgen; #[wasm_bindgen] pub async fn example() { web_sys::console::log_1(&"----------------------------------------------------".into()); web_sys::console::log_1(&"Test #1".into()); web_sys::console::log_1(&"Tine to start test".into()); bench1::time_to_start().await; web_sys::console::log_1(&"----------------------------------------------------".into()); web_sys::console::log_1(&"Test #2".into()); web_sys::console::log_1(&"Tine to complete test".into()); bench2::time_to_complete().await; web_sys::console::log_1(&"----------------------------------------------------".into()); web_sys::console::log_1(&"Test #3".into()); web_sys::console::log_1(&"Workload unjoined".into()); bench3::workload_unjoined().await; web_sys::console::log_1(&"----------------------------------------------------".into()); web_sys::console::log_1(&"Test #4".into()); web_sys::console::log_1(&"Workload joined ".into()); bench4::workload_joined().await; web_sys::console::log_1(&"----------------------------------------------------".into()); web_sys::console::log_1(&"Test #5".into()); web_sys::console::log_1(&"Large stack allocation".into()); bench5::large_stack_allocation().await; } ```
Bajix commented 1 month ago

I'll have to think this over some more. The free list implementation has type recursion and so it wouldn't compile unless I limited the level of nesting, and so right now it's limited to 16777216 futures but I can increase this to 1073741824 with an extra level of nesting. The bigger issue is that tasks only deallocate when polled to completion so if the last raw waker is dropped then it won't deallocate. If I add in reference counting then it will prevent memory leaking when unable to poll and this can be done without using Rc/Arc by using tagged pointers. I think if I iterate some more on this idea that there's a possibility for a spawn64 release that's consistently 30% faster than wasm_bindgen_futures and capable of handling large numbers of futures without there being any memory leaks.