Open Rufflewind opened 7 years ago
Thanks for reporting this! That is definitely substantially longer than expected. Looks like practically all of that time is spent in LLVM:
$ cargo rustc --release -- -Z time-passes
time: 1.637; rss: 275MB translation
time: 0.000; rss: 275MB assert dep graph
time: 0.000; rss: 275MB serialize dep graph
time: 1.474; rss: 276MB llvm function passes [0]
time: 52.386; rss: 317MB llvm module passes [0]
time: 8.498; rss: 318MB codegen passes [0]
time: 0.001; rss: 318MB codegen passes [0]
time: 62.364; rss: 318MB LLVM passes
time: 0.000; rss: 318MB serialize work products
time: 0.247; rss: 170MB running linker
time: 0.248; rss: 170MB linking
Finished release [optimized] target(s) in 64.62 secs
I will try to narrow down whether a specific pattern on our end is triggering slow behavior.
On my computer the compile time behaves extremely predictably. It is uniformly 2.6 seconds per Deserialize impl that is instantiated. I was able to predict compile time to +/-0.1 seconds as I added and removed types in the AST. Nothing else matters—number of variants, recursive ADTs, size of types, etc.
I think this is the same root cause as https://github.com/serde-rs/serde/issues/286. There is no inherent reason it needs to be slow, we just need to work through and eliminate the unnecessary instantiations listed in https://github.com/serde-rs/serde/issues/286#issuecomment-279325284.
Here are the size (first column) and count (second column) of instantiations resulting from the following line.
let _ = serde_json::from_str::<Pandoc>("");
356849 39 <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
160228 27 <serde_json::de::Deserializer<R>>::parse_value
56303 12 <core::marker::PhantomData<T> as serde::de::DeserializeSeed<'de>>::deserialize
44933 2 <testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
42779 7 <&'a mut A as serde::de::SeqAccess<'de>>::next_element
34129 2 <serde::de::impls::TupleVisitor3<T0, T1, T2> as serde::de::Visitor<'de>>::visit_seq
34097 5 <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_tuple
27098 65 <serde_json::de::Deserializer<R>>::parse_integer
25895 2 <testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
20787 2 <testing::_IMPL_DESERIALIZE_FOR_Citation::<impl serde::de::Deserialize<'de> for testing::Citation>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
20175 1 <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
19642 65 <serde_json::de::Deserializer<R>>::parse_exponent
18695 59 <serde_json::de::Deserializer<R>>::parse_number
17630 65 <serde_json::de::Deserializer<R>>::parse_decimal
17287 65 <serde_json::de::Deserializer<R>>::parse_exponent_overflow
16642 61 <serde_json::de::Deserializer<R>>::visit_f64_from_parts
365748 65 <serde_json::de::Deserializer<R>>::parse_value
340968 75 <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
132646 65 <serde_json::de::Deserializer<R>>::parse_integer
129274 65 <serde_json::de::Deserializer<R>>::parse_exponent
125120 65 <serde_json::de::Deserializer<R>>::parse_decimal
54869 65 <serde_json::de::Deserializer<R>>::visit_f64_from_parts
53625 65 <serde_json::de::Deserializer<R>>::parse_number
51207 38 <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
50035 79 <core::option::Option<T>>::map
49701 65 <serde_json::de::Deserializer<R>>::parse_long_integer
48402 65 <serde_json::de::Deserializer<R>>::parse_exponent_overflow
40370 82 <core::result::Result<T, E>>::map
36836 16 <<testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
35970 12 <serde::private::de::content::TaggedContentVisitor<T> as serde::de::Visitor<'de>>::visit_map
35250 75 <serde::de::value::MapDeserializer<'de, I, E>>::end
31802 12 <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
@Rufflewind would you be interested in helping us eliminate these? The number parsing code (parse_integer, parse_decimal, etc) are probably the easiest place to get started and would be super impactful. Those functions are all generic on the Visitor but do nothing with the Visitor until the number is fully parsed - so all of that should be factored out into a helper that can be instantiated once.
Here’s an attempt: #314
I started off with the easiest cases where the visitor method being called is unconditionally known.
Here’s the call graph of the original code for reference. (Hopefully I didn’t miss anything?)
The next avenue I would look at would be reducing instantiation of Visitor methods. Currently in serde_json if you have a Deserialize impl that calls deserialize_struct, it gets forwarded to deserialize_any which means it instantiates every possible method on the Visitor. We know that structs can only be deserialized successfully from map and seq, so only those should need to be instantiated. And similar deal for almost all of the other Deserializer methods. We need to implement them explicitly in serde_json rather than forwarding, and have them instantiate only the relevant Visitor methods.
It doesn't show up in the symbol profile above because each individual method of each individual visitor is instantiated only once, but there are so many of them that I expect it adds up.
Assuming I understood correctly, I tried to add a specialization called parse_aggregate
which only calls visit_{seq,map}
(see patch below). However, I didn’t notice any performance changes with this change (neither time nor memory usage).
Any update on what’s needed to solve this? Willing to help however needed, as this currently effects my workflow.
Practically all the time is spent in LLVM passes, so we just need to find a way to make less work for LLVM. I made a little tool cargo llvm-lines
to add up the number of lines of LLVM IR by function. I don't know whether this is the best way to prioritize what needs to be fixed, but it should get us started. Here is what it has to say about the json-benchmark
repo.
The first column is the total number of lines of LLVM IR, the second column is the number of copies of the function that were instantiated with different generic parameters. @Coding-Doctors maybe try this on your codebase and tackle whatever is resulting in the most IR.
$ cargo llvm-lines --bin json-benchmark --no-default-features --features parse-struct,stringify-struct,all-files,lib-serde | head -30
102456 84 <serde_json::de::Deserializer<R>>::parse_value
24834 63 <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
10044 54 <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeMap>::serialize_value
8718 26 <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_key_seed
8029 1 <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
7728 84 serde_json::error::Error::fix_position
7462 395 core::ptr::drop_in_place
6597 59 <core::result::Result<T, E>>::map
5205 1 <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
4839 1 <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
4574 52 <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_value_seed
4131 51 <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeStruct>::serialize_field
3807 13 serde::ser::Serializer::collect_seq
3399 83 serde::de::Visitor::visit_bool
3389 1 <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
3286 84 serde_json::de::Number::visit
3185 1 <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
3150 83 serde::de::Visitor::visit_f64
3092 17 <alloc::raw_vec::RawVec<T, A>>::double
2960 78 serde::de::Visitor::visit_i64
2910 15 <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeSeq>::serialize_element
2895 1 <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Media::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Media>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
2654 83 serde::de::Visitor::visit_unit
2636 13 <serde::de::impls::<impl serde::de::Deserialize<'de> for alloc::vec::Vec<T>>::deserialize::SeqVisitor<T> as serde::de::Visitor<'de>>::visit_seq
2627 29 <core::option::Option<T>>::map
2440 52 serde::de::Visitor::visit_str
2348 56 serde::de::Visitor::visit_map
2072 1 <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
2048 54 serde::de::Visitor::visit_u64
1969 1 json_benchmark::copy::twitter::_IMPL_SERIALIZE_FOR_User::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
Have you checked whether building with mir optimizations makes a difference? They run on generic code, instead of on the monomorphized versions
Good idea! I tried on rustc 1.24.0-nightly (0a2e9ade8 2017-11-29):
cargo rustc ... -- -Zmir-opt-level=3
with MIR opt level 3 and 4, with and without --release
, and it did not seem to help.
@oli-obk suggested in IRC looking at the generated MIR and filing suggestions for MIR optimizations that could have an impact.
cargo rustc ... -- --emit=mir
(then look in target/release/deps/my_crate-111111hash111111.mir
)
I found a 20% improvement in #388.
I tried factoring out the non-visitor parts of https://github.com/serde-rs/json/blob/92ddbdf493f338eead7d2632f72deeec2f94776d/src/de.rs#L1480-L1511 and https://github.com/serde-rs/json/blob/92ddbdf493f338eead7d2632f72deeec2f94776d/src/de.rs#L1430-L1459 but did not see any improvements.
Thanks @Marwes. I tried those as well and saw the same thing. It would be worth looking into why that does not result in an improvement, as it is a fairly large chunk of logic that is instantiated quite a lot. Understanding why the current implementations compile fast or the refactored implementations compile slow would help prioritize what to look at next.
I’m currently trying out the
pandoc_ast
crate, which contains a handful of recursive algebraic data types with theSerialize
andDeserialize
traits derived. When I tried the minimal example below with optimizations on (--release
), it took a whole minute to compile!I cloned the
pandoc_ast
repo and messed around a bit: it seems that the library itself takes practically no time to compile, but if I add a concrete function that instantiatesserde_json::from_str<Pandoc>
, the compile time spikes to 60 seconds or so. I'm not sure if this is caused byserde
,serde_derive
, orserde_json
.For the smaller nonrecursive enums (e.g.
pandoc_ast::MathType
) with only a few variants, it takes about 2-3 seconds to instantiate. The larger ones (e.g.pandoc_ast::Block
, which contains many variants as well aspandoc_ast::Inline
, which itself contains even more variants) take about 50 sec.Environment:
cargo (build|test) --release