Closed Bajix closed 1 month ago
It's performing worse for me.
fn async_generator() -> impl Future<Output = ()> {
async { async_std::task::sleep(Duration::from_secs(1)).await }
}
fn simple_bench<O: Future<Output = ()>, K: Fn() -> O, F: Fn(O)>(
bench_fn: F,
generator: K,
) -> chrono::TimeDelta {
let start = Utc::now();
for _ in 0..200_000 {
bench_fn(generator())
}
Utc::now() - start
}
fn delta_format(v: TimeDelta) -> String {
format!("{}.{}s", v.num_seconds(), v.num_milliseconds() % 1000)
}
#[wasm_bindgen]
pub async fn example() {
let r2 = simple_bench(spawn64::spawn_local, async_generator);
let r1 = simple_bench(wasm_bindgen_futures::spawn_local, async_generator);
web_sys::console::log_1(
&format!(
"wasm_bindgen_futures: {}, spawn64: {}",
delta_format(r1),
delta_format(r2)
)
.into(),
);
}
Output:
wasm_bindgen_futures: 0.18s, spawn64: 3.401s
Benching how long it takes to enqueue 200k futures doesn't make sense because it omits differences in time to poll and because in a browser environment there aren't going to be many futures. Spawn64 has the advantage that tasks don't need to be wrapped in RC and for workloads fewer than 64 there aren't any non-task heap allocations. The reason why it's bad at this specific benchmark is because for each task, it checks N >> 6 linked slabs for free slots where N is the number of previously enqueued tasks, however I have a free list optimization WIP to make this lookup constant time.
If the async functions are not doing very much, it's the difference of less than 300ms for 200,000 tests. In comparison, the delay from the start is at least 3s worse, with the last call to sleep being delayed to 4s. (The time to start is important when waiting on sleeping tasks)
fn async_send_on_complete_gen() -> (impl Future<Output = ()>, Receiver<i32>) {
let (s, after_test_receiver) = async_channel::unbounded();
let test_future = async move {
let result = 0;
let _ = s.send(result).await;
};
(test_future, after_test_receiver)
}
fn create_summary(times: &Vec<TimeDelta>) -> String {
// times.sort();
let mut times = times.clone();
times.sort();
let micro_times: Vec<i64> = times
.iter()
.map(|v| v.num_microseconds().unwrap())
.collect();
let median = micro_times[micro_times.len() / 2];
let avg: f64 = (micro_times.iter().sum::<i64>() as f64) / (micro_times.len() as f64);
let avg = (avg * 100f64).floor() / 100.0;
let test_count = micro_times.len();
let total_time: f64 = (micro_times.iter().sum::<i64>() as f64 / 1000.0) / 1000.0;
format!(
"best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}",
micro_times[0],
micro_times[micro_times.len() - 1]
)
}
async fn simple_bench_2<
T,
LOCALSPAWN: Future<Output = ()>,
GENERATOR: Fn() -> (LOCALSPAWN, Receiver<T>),
BENCH: Fn(LOCALSPAWN),
>(
bench_fn: BENCH,
test_generator: GENERATOR,
) -> Vec<TimeDelta> {
let mut times = vec![];
for _ in 0..200_000 {
let start = Utc::now();
let (fut, on_complete) = test_generator();
bench_fn(fut);
let _ = on_complete.recv().await;
times.push(Utc::now() - start)
}
times.sort();
times
}
#[wasm_bindgen]
pub async fn on_complete_bench() {
let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await;
let wasm_bindgen = simple_bench_2(
wasm_bindgen_futures::spawn_local,
async_send_on_complete_gen,
)
.await;
web_sys::console::log_1(
&format!(
"wasm_bindgen_futures: {:#?} ",
create_summary(&wasm_bindgen),
)
.into(),
);
web_sys::console::log_1(
&format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
);
web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into());
async_std::task::sleep(Duration::from_secs(10)).await;
let spawn_64 = simple_bench_2(spawn64::spawn_local, async_send_on_complete_gen).await;
web_sys::console::log_1(
&format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
);
web_sys::console::log_1(&"Waiting 10 seconds before final test.".into());
async_std::task::sleep(Duration::from_secs(10)).await;
let wasm_bindgen = simple_bench_2(
wasm_bindgen_futures::spawn_local,
async_send_on_complete_gen,
)
.await;
web_sys::console::log_1(
&format!(
"wasm_bindgen_futures: {:#?} ",
create_summary(&wasm_bindgen),
)
.into(),
);
}
Output:
wasm_bindgen_futures: "best: 0 µs, worst: 16000 µs, avg: 8.21 µs, median: 0 µs, total_time: 1.643 s, test_count: 200000"
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 7.11 µs, median: 0 µs, total_time: 1.423 s, test_count: 200000"
Waiting 10 seconds then trying again.
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 7.1 µs, median: 0 µs, total_time: 1.421 s, test_count: 200000"
Waiting 10 seconds before final test.
wasm_bindgen_futures: "best: 0 µs, worst: 6000 µs, avg: 7.9 µs, median: 0 µs, total_time: 1.581 s, test_count: 200000"
There were the same amount of tests.
Spawn Function | Test 1: Time to start | Test 2: Time to complete | Total |
---|---|---|---|
spawn64 | 3.401s | 1.421 | 4.822s |
wasm_bindgen_futures | 0.18s | 1.581s | 1.761 |
Results | 🔴 + 3.221s | 🟢 - 0.16s | 🔴3.061 |
I also tested a workloads of size 63
fn async_send_on_complete_gen() -> (impl Future<Output = ()>, Receiver<i32>) {
let (s, after_test_receiver) = async_channel::unbounded();
let test_future = async move {
let result = 0;
let _ = s.send(result).await;
};
(test_future, after_test_receiver)
}
fn create_summary(times: &Vec<TimeDelta>) -> String {
// times.sort();
let mut times = times.clone();
times.sort();
let micro_times: Vec<i64> = times
.iter()
.map(|v| v.num_microseconds().unwrap())
.collect();
let median = micro_times[micro_times.len() / 2];
let avg: f64 = (micro_times.iter().sum::<i64>() as f64) / (micro_times.len() as f64);
let avg = (avg * 100f64).floor() / 100.0;
let test_count = micro_times.len();
let total_time: f64 = (micro_times.iter().sum::<i64>() as f64 / 1000.0) / 1000.0;
format!(
"best: {} µs, worst: {} µs, avg: {avg} µs, median: {median} µs, total_time: {total_time} s, test_count: {test_count}",
micro_times[0],
micro_times[micro_times.len() - 1]
)
}
async fn simple_bench_3<
T,
LOCALSPAWN: Future<Output = ()>,
GENERATOR: Fn() -> (LOCALSPAWN, Receiver<T>),
BENCH: Fn(LOCALSPAWN),
>(
bench_fn: BENCH,
test_generator: GENERATOR,
) -> Vec<TimeDelta> {
let mut times = vec![];
for _ in 0..200_000 {
let mut work = vec![];
for _ in 0..63 {
let (fut, item) = test_generator();
let fut = Some(fut);
work.push((fut, item));
}
let start = Utc::now();
for i in 0..63 {
bench_fn(work[i].0.take().unwrap());
}
for i in 0..63 {
let _ = work[i].1.recv().await;
}
times.push(Utc::now() - start)
}
times.sort();
times
}
#[wasm_bindgen]
pub async fn no_queue_work() {
let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await;
let wasm_bindgen = simple_bench_3(
wasm_bindgen_futures::spawn_local,
async_send_on_complete_gen,
)
.await;
web_sys::console::log_1(
&format!(
"wasm_bindgen_futures: {:#?} ",
create_summary(&wasm_bindgen),
)
.into(),
);
web_sys::console::log_1(
&format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
);
web_sys::console::log_1(&"Waiting 10 seconds then trying again.".into());
async_std::task::sleep(Duration::from_secs(10)).await;
let spawn_64 = simple_bench_3(spawn64::spawn_local, async_send_on_complete_gen).await;
web_sys::console::log_1(
&format!("spawn64::spawn_local: {:#?} ", create_summary(&spawn_64),).into(),
);
web_sys::console::log_1(&"Waiting 10 seconds before final test.".into());
async_std::task::sleep(Duration::from_secs(10)).await;
let wasm_bindgen = simple_bench_3(
wasm_bindgen_futures::spawn_local,
async_send_on_complete_gen,
)
.await;
web_sys::console::log_1(
&format!(
"wasm_bindgen_futures: {:#?} ",
create_summary(&wasm_bindgen),
)
.into(),
);
}
Output:
wasm_bindgen_futures: "best: 0 µs, worst: 15000 µs, avg: 13.4 µs, median: 0 µs, total_time: 2.681 s, test_count: 200000"
spawn64::spawn_local: "best: 0 µs, worst: 8000 µs, avg: 11.28 µs, median: 0 µs, total_time: 2.256 s, test_count: 200000"
Waiting 10 seconds then trying again.
spawn64::spawn_local: "best: 0 µs, worst: 8000 µs, avg: 11.92 µs, median: 0 µs, total_time: 2.384 s, test_count: 200000"
Waiting 10 seconds before final test.
wasm_bindgen_futures: "best: 0 µs, worst: 8000 µs, avg: 13.19 µs, median: 0 µs, total_time: 2.638 s, test_count: 200000"
Ok so this definitely shows that the tagged pointer raw waker approach of Spawn64 outperforms the alternative of using RC wrapped tasks for polling. I've got a really solid algorithm change in mind for supporting a large number of futures which should beat the wasm_bindgen_futures time to start
0.2.2-beta.0
has an initial free list implementation here: https://github.com/Bajix/spawn64/tree/free-list
Test #1
Tine to start test
wasm_bindgen_futures: 0.8s, spawn64: 0.10s
----------------------------------------------------
Test #2
Tine to complete test
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 8.77 µs, median: 0 µs, total_time: 1.755 s, test_count: 200000"
spawn64::spawn_local: "best: 0 µs, worst: 766000 µs, avg: 12.3 µs, median: 0 µs, total_time: 2.46 s, test_count: 200000"
Waiting 10 seconds then trying again.
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 8.44 µs, median: 0 µs, total_time: 1.689 s, test_count: 200000"
Waiting 10 seconds before final test.
wasm_bindgen_futures: "best: 0 µs, worst: 6000 µs, avg: 8.71 µs, median: 0 µs, total_time: 1.743 s, test_count: 200000"
----------------------------------------------------
Test #3
Workload unjoined
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 14.52 µs, median: 0 µs, total_time: 2.904 s, test_count: 200000"
spawn64::spawn_local: "best: 0 µs, worst: 15000 µs, avg: 11.02 µs, median: 0 µs, total_time: 2.205 s, test_count: 200000"
Waiting 10 seconds then trying again.
spawn64::spawn_local: "best: 0 µs, worst: 7000 µs, avg: 11.08 µs, median: 0 µs, total_time: 2.216 s, test_count: 200000"
Waiting 10 seconds before final test.
wasm_bindgen_futures: "best: 0 µs, worst: 8000 µs, avg: 14.81 µs, median: 0 µs, total_time: 2.963 s, test_count: 200000"
----------------------------------------------------
Test #4
Workload joined
spawn_local: "best: 0 µs, worst: 15000 µs, avg: 12.66 µs, median: 0 µs, total_time: 2.532 s, test_count: 200000"
wasm_bindgen_futures: "best: 0 µs, worst: 1000 µs, avg: 18.26 µs, median: 0 µs, total_time: 3.652 s, test_count: 200000"
----------------------------------------------------
Test #5
Large stack allocation
Waiting 5 seconds to start the test (garbage collector issues)
wasm_bindgen_futures: "best: 7000 µs, worst: 13000 µs, avg: 7850 µs, median: 8000 µs, total_time: 0.785 s, test_count: 100"
panicked at src\lib.rs:506:79:
called `Result::unwrap()` on an `Err` value: RecvError
The last test is failing for spawn64. If I remove earlier tests, it just hangs for spawn64 instead of failing. I don't really know what's wrong, it looks like the future is being dropped. I think wasm_bindgen_futures will drop too if memory in the browser becomes too high. The difference is that by lowering the workload, wasm_bindgen_futures can still complete Test 5, but spawn64 just hangs.
I'll have to think this over some more. The free list implementation has type recursion and so it wouldn't compile unless I limited the level of nesting, and so right now it's limited to 16777216 futures but I can increase this to 1073741824 with an extra level of nesting. The bigger issue is that tasks only deallocate when polled to completion so if the last raw waker is dropped then it won't deallocate. If I add in reference counting then it will prevent memory leaking when unable to poll and this can be done without using Rc/Arc by using tagged pointers. I think if I iterate some more on this idea that there's a possibility for a spawn64 release that's consistently 30% faster than wasm_bindgen_futures
and capable of handling large numbers of futures without there being any memory leaks.
I authored spawn64 as an optimized alternative to
wasm_bindgen_futures::spawn_local
. If there are ever added benchmarks forwasm_bindgen_futures::spawn_local
it would be nice if this could be compared againstspawn64