Open dtolnay opened 6 years ago
Here is part of it: https://github.com/rust-lang/rust/issues/47628.
Building the same commit (59017aa19b48a2fff795f4e1915dbb596ac427db) with some newer compilers:
compiler | time in seconds |
---|---|
1.24.0 | 8.6 |
1.25.0 | 7.7 |
1.26.0 | 8.5 |
1.27.0 | 8.5 |
1.28.0 | 8.5 |
1.29.0 | 8.1 |
1.30.0 | 8.0 |
1.31.0 | 8.2 |
1.32.0 | 7.6 |
1.33.0-beta | 7.6 |
1.32.0-nightly | 7.8 |
One thought I had today about the compile time of the serde crate is that it very frequently nowadays has the derive
feature enabled, which means that the compile time of serde
is actually that of serde_derive
plus serde
itself. That dependency edge can add up to ~20s just to compile, magnifying the compile time much more than the code in serde
itself.
There's not really a solution to this today per se. A workaround is to meticulously only use serde_derive
in all your crates, and then hopefully some crates only depend on serde
to compile faster before serde_derive
finishes. Practically though this is unlikely to bring much benefit. Actually solving this would require extensive compiler changes, and I'm not even sure what those would look like!
Put another way though the compile time of serde
I don't see as too too important, but serde_derive
, which almost everyone using serde
needs, is more important because it takes longer to compile than serde
itself.
One "quick" way to improve the comiple time of the serde_derive
crate is to make the crate itself an empty crate which immediately calls into another crate. Cargo can't pipeline compilation of procedural macros, so procedural macros fully wait for their dependencies to finish (in this case syn
). Codegen for syn
often takes awhile though (even 40% of time in a debug build) and that's opportunity for parallelism which isn't currently being taken advantage of. If serde_derive
were a tiny crate that simply called serde_derive_impl::expand(the_inputs...)
then that would enable pipelining, and the compile time of serde_derive
itself would be near instantaneous (it's just a link step). That'd improve compile times modestly I think, but it's no slam dunk by any means.
Edit: For a more visceral view of what I mean about serde_derive
, here's some compile charts for a project which just depends on serde
but enables the derive
feature: debug mode, release mode. You can see how serde_derive
can't be pipelined at all, but its compile time is quite large and being able to pipeline it with syn
would likely directly translate to build time wins equivalent to the codegen time of syn
(2s ish)
Thanks, I can see how that would help. That is a neat visualization.
It is unfortunate that most people would probably perceive it the opposite way if we make this change -- more crates = more bloat = perceived as slow, even if it builds end-to-end faster than the current setup.
Here's another data point of a different graph I've seen:
That's got a lot of dependencies pruned, but the entire compilation of serde_derive
could fit within the codegen time of syn
. That may happen when syn
has all of its features enabled, perhaps? In any case I'm getting similar 1-2s wins locally in my graphs, but I think the wins get magnified on smaller machines and in larger compilation graphs. For example the compilation here could probably save at least 10-20s with a pipelining opportunity.
Anyway, as I already wrote on the forums, there's a feature request on rustc to enable this pipelining and I agree that in general users (even serde) shouldn't have to, in the long term, use a crate organization like this just to get better compile times.
I have a bunch of observations/questions here!
To me, it seems another issue is that the code that uses serde_derive is slow to compile. Specifically, it seems that JSON serialization turns out to be a non-trivial part of rust-analyzer's overall compile time.
I can't measure that directly, but the following two high-level benches nod in that direction:
First, if I run RUSTC_BOOTSTRAP=1 cargo build -Z timings --release -p rust-analyzer --bin rust-analyzer
, I get the following output:
cargo_metadata
and especially lsp-types
are among the heaviest dependencies (rust-analyzer's own crates are noticeably bigger, but they contain a lot of domain logic). Both cargo_metadata
and lsp-types
are mostly struct and enum definitions with derive(serde)
.
Second, if I run cargo llvm-lines --release -p rust-analyzer --bin rust-analyzer
, I get the following:
326920 (100%) 9611 (100%) (TOTAL)
12281 (3.8%) 196 (2.0%) core::result::Result<T,E>::map
10576 (3.2%) 49 (0.5%) <serde_json::value::de::MapDeserializer as serde::de::MapAccess>::next_key_seed
10080 (3.1%) 63 (0.7%) serde_json::value::de::visit_array
9487 (2.9%) 88 (0.9%) <serde_json::value::de::SeqDeserializer as serde::de::SeqAccess>::next_element_seed
7987 (2.4%) 50 (0.5%) serde_json::value::de::visit_object
6981 (2.1%) 48 (0.5%) serde_json::value::de::<impl serde::de::Deserializer for serde_json::value::Value>::deserialize_struct
6921 (2.1%) 77 (0.8%) <serde_json::value::de::MapDeserializer as serde::de::MapAccess>::next_value_seed
6534 (2.0%) 54 (0.6%) <serde_json::value::ser::SerializeMap as serde::ser::SerializeMap>::serialize_value
4273 (1.3%) 1 (0.0%) <lsp_types::_::<impl serde::de::Deserialize for lsp_types::TextDocumentClientCapabilities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
3808 (1.2%) 28 (0.3%) alloc::raw_vec::RawVec<T,A>::grow_amortized
3692 (1.1%) 26 (0.3%) alloc::raw_vec::RawVec<T,A>::allocate_in
3395 (1.0%) 56 (0.6%) serde_json::value::de::<impl serde::de::Deserializer for serde_json::value::Value>::deserialize_option
3192 (1.0%) 56 (0.6%) alloc::raw_vec::RawVec<T,A>::current_memory
2877 (0.9%) 49 (0.5%) <serde_json::value::de::BorrowedCowStrDeserializer as serde::de::Deserializer>::deserialize_any
2757 (0.8%) 49 (0.5%) <serde_json::value::de::MapKeyDeserializer as serde::de::Deserializer>::deserialize_any
2624 (0.8%) 148 (1.5%) <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
2538 (0.8%) 51 (0.5%) serde::de::Visitor::visit_string
2520 (0.8%) 28 (0.3%) core::alloc::layout::Layout::array
2388 (0.7%) 1 (0.0%) <lsp_types::_::<impl serde::de::Deserialize for lsp_types::TextDocumentClientCapabilities>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
2078 (0.6%) 1 (0.0%) lsp_types::_::<impl serde::ser::Serialize for lsp_types::ServerCapabilities>::serialize
2020 (0.6%) 1 (0.0%) rust_analyzer::run_server
It seems to be that sered-using crates are both heavy to compile themselves, as well as add downstream monomorphisations to the consumers.
With this in mind, I have the following questions (extremely vague, mind you!):
We were doing so well for a few rustc releases. Building the current master branch of serde:
But since then it has been going up and up:
Will need to look into what makes the current code take 50% longer to build with a modern compiler compared to an old compiler.