Open osiewicz opened 10 months ago
Does this affect release builds too? I would be okay with doing something about this if it affects compile times. The crate itself compiles in 0.22s on my machine, but if the issue is monomorphisation I can see why it would be an issue.
Ideally any compile time improvements would be tested on a larger-scale crate that depends on async-task
, like tide
's examples. Before we commit to any new patterns we should benchmark against that.
One possible way would be to disable the Box
ing branch on debug_assertions
or with a specific feature flag. Again, I would like to see benchmarks before I commit to this.
Yeah, this does affect release builds as well. I wrote up a trivial benchmark: | Debug | Release | |
---|---|---|---|
Baseline | 0.9s | 2s | |
Patched | 0.65s | 1.35s | |
% of Baseline | 72% | 67% | |
LLVM Lines Baseline | 170310 | 144200 | |
LLVM Lines Patched | 85023 | 71760 | |
LLVM Lines % of Baseline | 50% | 50% |
You can increase the # of lines and the compile time will scale linearly, as on each line we essentially instantiate a new scheduler and future. If the user were to Box the scheduler, erasing the type, we would scale by the # of distinct return types instead.
I'm not sure if tide
is a great benchmark there, as # of instantiations of async-task
seems rather scarce in the examples, so changes are not likely to show up in timings. I think we could start with an isolated benchmark and move from there.
I would err on the side of disabling the boxing branch with a specific feature flag (or rather, enabling the boxing with a feature flag). That feature could be enabled by default.
Even if the user always boxes their futures, they have to pay the compile-time cost (as then the "boxing branch" tries to instantiate Pin<Box<Pin<Box<F>>>>
). In that case being able to disable the implicit boxing altogether would be great for them.
I'm pretty positive that there's other avenues we could explore to improve the compile time further for downstream users. I'm glad that you're open to that. :)
async-task
contains an optimisation for handling large futures in the definition of spawn_unchecked. This leads to excessive IR size, as one branch instantiatesRawTask
with Fut and the other does so withPin<Box<Fut>>
. This probably gets eliminated within LLVM (as the branch itself is trivial), but it's still a bummer that this cannot be truly determined at compile time. I took several stabs recently at getting rid of the unnecessary instantiation, without luck. I do understand why we need the boxing, but it'd be nice to not spend time on generating code we're gonna throw away in LLVM anyways. Getting rid of large-future-boxing reduces the LLVM IR size for my medium-size (~1.5M LLVM IR lines in debug) crate by 7%. This is also replicated in examples from this crate:Related: https://github.com/rust-lang/rust/issues/85836