The difference between async-trait and the native async fn in traits implementation brings in the question of accessing large futures by not boxing them. This benchmark creates 100K u64's on the stack and runs the actor passing around state to be in both scenarios in order to assess the impact of not boxing the futures for performance
Results
Without async-trait, using native async fn in the trait
3 Trials, with 50k u64's on the stack
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime
Finished bench [optimized] target(s) in 1.57s
Running benches/async_traits.rs (target/release/deps/async_traits-c230612eb7565929)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [134.00 µs 135.02 µs 136.16 µs]
change: [+186.13% +189.02% +191.85%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime
Finished bench [optimized] target(s) in 0.23s
Running benches/async_traits.rs (target/release/deps/async_traits-c230612eb7565929)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [127.87 µs 128.85 µs 129.98 µs]
change: [-4.6896% -3.6128% -2.5389%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
4 (4.00%) high mild
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime
Finished bench [optimized] target(s) in 0.27s
Running benches/async_traits.rs (target/release/deps/async_traits-c230612eb7565929)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [127.87 µs 128.90 µs 129.95 µs]
change: [-1.3544% -0.2167% +0.9568%] (p = 0.72 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Using async-trait
3 Trials, with 50k u64's on the stack
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime,async-trait
Finished bench [optimized] target(s) in 1.57s
Running benches/async_traits.rs (target/release/deps/async_traits-9971906e70dc3ec1)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [134.10 µs 134.85 µs 135.60 µs]
change: [+3.7305% +4.8950% +5.9779%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime,async-trait
Finished bench [optimized] target(s) in 0.33s
Running benches/async_traits.rs (target/release/deps/async_traits-9971906e70dc3ec1)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [135.92 µs 136.63 µs 137.35 µs]
change: [+0.9529% +2.2175% +3.9541%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
$ cargo bench --bench async_traits -p ractor --no-default-features -F tokio_runtime,async-trait
Finished bench [optimized] target(s) in 0.25s
Running benches/async_traits.rs (target/release/deps/async_traits-9971906e70dc3ec1)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
time: [132.12 µs 132.77 µs 133.41 µs]
change: [-5.5130% -3.9929% -2.7036%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
Here we can see very similar timing information between the two implementations. If we reduce the stack-size data, then the native async fn implementation starts outperforming the async-trait version.
The difference between
async-trait
and the nativeasync fn
in traits implementation brings in the question of accessing large futures by not boxing them. This benchmark creates 100K u64's on the stack and runs the actor passing around state to be in both scenarios in order to assess the impact of not boxing the futures for performanceResults
Without
async-trait
, using nativeasync fn
in the trait3 Trials, with 50k u64's on the stack
Using
async-trait
3 Trials, with 50k u64's on the stack
Here we can see very similar timing information between the two implementations. If we reduce the stack-size data, then the native
async fn
implementation starts outperforming theasync-trait
version.