serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.88k stars 556 forks source link

Serialize/Deserialize implementations are very slow to compile #313

Open Rufflewind opened 7 years ago

Rufflewind commented 7 years ago

I’m currently trying out the pandoc_ast crate, which contains a handful of recursive algebraic data types with the Serialize and Deserialize traits derived. When I tried the minimal example below with optimizations on (--release), it took a whole minute to compile!

extern crate pandoc_ast;
fn main() {
    pandoc_ast::filter(String::new(), |p| p);
}

I cloned the pandoc_ast repo and messed around a bit: it seems that the library itself takes practically no time to compile, but if I add a concrete function that instantiates serde_json::from_str<Pandoc>, the compile time spikes to 60 seconds or so. I'm not sure if this is caused by serde, serde_derive, or serde_json.

For the smaller nonrecursive enums (e.g. pandoc_ast::MathType) with only a few variants, it takes about 2-3 seconds to instantiate. The larger ones (e.g. pandoc_ast::Block, which contains many variants as well as pandoc_ast::Inline, which itself contains even more variants) take about 50 sec.

Environment:

dtolnay commented 7 years ago

Thanks for reporting this! That is definitely substantially longer than expected. Looks like practically all of that time is spent in LLVM:

$ cargo rustc --release -- -Z time-passes

  time: 1.637; rss: 275MB   translation
  time: 0.000; rss: 275MB   assert dep graph
  time: 0.000; rss: 275MB   serialize dep graph
    time: 1.474; rss: 276MB llvm function passes [0]
    time: 52.386; rss: 317MB    llvm module passes [0]
    time: 8.498; rss: 318MB codegen passes [0]
    time: 0.001; rss: 318MB codegen passes [0]
  time: 62.364; rss: 318MB  LLVM passes
  time: 0.000; rss: 318MB   serialize work products
    time: 0.247; rss: 170MB running linker
  time: 0.248; rss: 170MB   linking
      Finished release [optimized] target(s) in 64.62 secs

I will try to narrow down whether a specific pattern on our end is triggering slow behavior.

dtolnay commented 7 years ago

On my computer the compile time behaves extremely predictably. It is uniformly 2.6 seconds per Deserialize impl that is instantiated. I was able to predict compile time to +/-0.1 seconds as I added and removed types in the AST. Nothing else matters—number of variants, recursive ADTs, size of types, etc.

I think this is the same root cause as https://github.com/serde-rs/serde/issues/286. There is no inherent reason it needs to be slow, we just need to work through and eliminate the unnecessary instantiations listed in https://github.com/serde-rs/serde/issues/286#issuecomment-279325284.

dtolnay commented 7 years ago

Here are the size (first column) and count (second column) of instantiations resulting from the following line.

let _ = serde_json::from_str::<Pandoc>("");

Release mode

  356849   39  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
  160228   27  <serde_json::de::Deserializer<R>>::parse_value
   56303   12  <core::marker::PhantomData<T> as serde::de::DeserializeSeed<'de>>::deserialize
   44933    2  <testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   42779    7  <&'a mut A as serde::de::SeqAccess<'de>>::next_element
   34129    2  <serde::de::impls::TupleVisitor3<T0, T1, T2> as serde::de::Visitor<'de>>::visit_seq
   34097    5  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_tuple
   27098   65  <serde_json::de::Deserializer<R>>::parse_integer
   25895    2  <testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   20787    2  <testing::_IMPL_DESERIALIZE_FOR_Citation::<impl serde::de::Deserialize<'de> for testing::Citation>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   20175    1  <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   19642   65  <serde_json::de::Deserializer<R>>::parse_exponent
   18695   59  <serde_json::de::Deserializer<R>>::parse_number
   17630   65  <serde_json::de::Deserializer<R>>::parse_decimal
   17287   65  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
   16642   61  <serde_json::de::Deserializer<R>>::visit_f64_from_parts

Debug mode

  365748   65  <serde_json::de::Deserializer<R>>::parse_value
  340968   75  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
  132646   65  <serde_json::de::Deserializer<R>>::parse_integer
  129274   65  <serde_json::de::Deserializer<R>>::parse_exponent
  125120   65  <serde_json::de::Deserializer<R>>::parse_decimal
   54869   65  <serde_json::de::Deserializer<R>>::visit_f64_from_parts
   53625   65  <serde_json::de::Deserializer<R>>::parse_number
   51207   38  <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
   50035   79  <core::option::Option<T>>::map
   49701   65  <serde_json::de::Deserializer<R>>::parse_long_integer
   48402   65  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
   40370   82  <core::result::Result<T, E>>::map
   36836   16  <<testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   35970   12  <serde::private::de::content::TaggedContentVisitor<T> as serde::de::Visitor<'de>>::visit_map
   35250   75  <serde::de::value::MapDeserializer<'de, I, E>>::end
   31802   12  <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq

@Rufflewind would you be interested in helping us eliminate these? The number parsing code (parse_integer, parse_decimal, etc) are probably the easiest place to get started and would be super impactful. Those functions are all generic on the Visitor but do nothing with the Visitor until the number is fully parsed - so all of that should be factored out into a helper that can be instantiated once.

Rufflewind commented 7 years ago

Here’s an attempt: #314

I started off with the easiest cases where the visitor method being called is unconditionally known.

Here’s the call graph of the original code for reference. (Hopefully I didn’t miss anything?)

graph

dtolnay commented 7 years ago

The next avenue I would look at would be reducing instantiation of Visitor methods. Currently in serde_json if you have a Deserialize impl that calls deserialize_struct, it gets forwarded to deserialize_any which means it instantiates every possible method on the Visitor. We know that structs can only be deserialized successfully from map and seq, so only those should need to be instantiated. And similar deal for almost all of the other Deserializer methods. We need to implement them explicitly in serde_json rather than forwarding, and have them instantiate only the relevant Visitor methods.

It doesn't show up in the symbol profile above because each individual method of each individual visitor is instantiated only once, but there are so many of them that I expect it adds up.

Rufflewind commented 7 years ago

Assuming I understood correctly, I tried to add a specialization called parse_aggregate which only calls visit_{seq,map} (see patch below). However, I didn’t notice any performance changes with this change (neither time nor memory usage).

```diff --- src/de.rs +++ src/de.rs @@ -174,6 +174,32 @@ impl<'de, R: Read<'de>> Deserializer { } } + fn parse_aggregate(&mut self, visitor: V) -> Result + where + V: de::Visitor<'de>, + { + let value = self.parse_aggregate_inner(visitor); + match value { + Ok(value) => Ok(value), + Err(err) => Err(self.fix_error(err)), + } + } + + fn parse_aggregate_inner(&mut self, visitor: V) -> Result + where + V: de::Visitor<'de>, + { + let agg = match try!(self.parse_value_begin()) { + Value::Aggregate(agg) => agg, + _ => return Err(self.error(ErrorCode::ExpectedSomeValue)), + }; + let ret = match agg { + Aggregate::Seq => visitor.visit_seq(SeqAccess::new(self)), + Aggregate::Map => visitor.visit_map(MapAccess::new(self)), + }; + self.parse_value_end(agg, ret) + } + fn parse_value(&mut self, visitor: V) -> Result where V: de::Visitor<'de>, @@ -780,9 +806,22 @@ impl<'de, 'a, R: Read<'de>> de::Deserializer<'de> for &'a mut Deserializer { self.deserialize_bytes(visitor) } + #[inline] + fn deserialize_struct( + self, + name: &'static str, + fields: &'static [&'static str], + visitor: V + ) -> Result + where + V: de::Visitor<'de>, + { + self.parse_aggregate(visitor) + } + forward_to_deserialize_any! { bool i8 i16 i32 i64 u8 u16 u32 u64 f32 f64 char str string unit - unit_struct seq tuple tuple_struct map struct identifier ignored_any + unit_struct seq tuple tuple_struct map identifier ignored_any } } ```
Arzte commented 6 years ago

Any update on what’s needed to solve this? Willing to help however needed, as this currently effects my workflow.

dtolnay commented 6 years ago

Practically all the time is spent in LLVM passes, so we just need to find a way to make less work for LLVM. I made a little tool cargo llvm-lines to add up the number of lines of LLVM IR by function. I don't know whether this is the best way to prioritize what needs to be fixed, but it should get us started. Here is what it has to say about the json-benchmark repo.

The first column is the total number of lines of LLVM IR, the second column is the number of copies of the function that were instantiated with different generic parameters. @Coding-Doctors maybe try this on your codebase and tackle whatever is resulting in the most IR.

$ cargo llvm-lines --bin json-benchmark --no-default-features --features parse-struct,stringify-struct,all-files,lib-serde | head -30

 102456   84  <serde_json::de::Deserializer<R>>::parse_value
  24834   63  <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
  10044   54  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeMap>::serialize_value
   8718   26  <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_key_seed
   8029    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   7728   84  serde_json::error::Error::fix_position
   7462  395  core::ptr::drop_in_place
   6597   59  <core::result::Result<T, E>>::map
   5205    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   4839    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   4574   52  <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_value_seed
   4131   51  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeStruct>::serialize_field
   3807   13  serde::ser::Serializer::collect_seq
   3399   83  serde::de::Visitor::visit_bool
   3389    1  <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   3286   84  serde_json::de::Number::visit
   3185    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   3150   83  serde::de::Visitor::visit_f64
   3092   17  <alloc::raw_vec::RawVec<T, A>>::double
   2960   78  serde::de::Visitor::visit_i64
   2910   15  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeSeq>::serialize_element
   2895    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Media::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Media>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   2654   83  serde::de::Visitor::visit_unit
   2636   13  <serde::de::impls::<impl serde::de::Deserialize<'de> for alloc::vec::Vec<T>>::deserialize::SeqVisitor<T> as serde::de::Visitor<'de>>::visit_seq
   2627   29  <core::option::Option<T>>::map
   2440   52  serde::de::Visitor::visit_str
   2348   56  serde::de::Visitor::visit_map
   2072    1  <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   2048   54  serde::de::Visitor::visit_u64
   1969    1  json_benchmark::copy::twitter::_IMPL_SERIALIZE_FOR_User::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
oli-obk commented 6 years ago

Have you checked whether building with mir optimizations makes a difference? They run on generic code, instead of on the monomorphized versions

dtolnay commented 6 years ago

Good idea! I tried on rustc 1.24.0-nightly (0a2e9ade8 2017-11-29):

cargo rustc ... -- -Zmir-opt-level=3

with MIR opt level 3 and 4, with and without --release, and it did not seem to help.

dtolnay commented 6 years ago

@oli-obk suggested in IRC looking at the generated MIR and filing suggestions for MIR optimizations that could have an impact.

cargo rustc ... -- --emit=mir

(then look in target/release/deps/my_crate-111111hash111111.mir)

dtolnay commented 6 years ago

I found a 20% improvement in #388.

dtolnay commented 6 years ago

389 was another 30% improvement.

Marwes commented 6 years ago

I tried factoring out the non-visitor parts of https://github.com/serde-rs/json/blob/92ddbdf493f338eead7d2632f72deeec2f94776d/src/de.rs#L1480-L1511 and https://github.com/serde-rs/json/blob/92ddbdf493f338eead7d2632f72deeec2f94776d/src/de.rs#L1430-L1459 but did not see any improvements.

dtolnay commented 6 years ago

Thanks @Marwes. I tried those as well and saw the same thing. It would be worth looking into why that does not result in an improvement, as it is a fairly large chunk of logic that is instantiated quite a lot. Understanding why the current implementations compile fast or the refactored implementations compile slow would help prioritize what to look at next.