llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.77k stars 11.9k forks source link

RDF Live Variables Analysis may cause ballooning compile times #46152

Open scottconstable opened 4 years ago

scottconstable commented 4 years ago
Bugzilla Link 46808
Version trunk
OS All

Extended Description

The RDF live variables analysis probably has high asymptotic complexity than what is actually necessary. In particular:

https://github.com/llvm/llvm-project/blob/b5be1c5419e2a38eb60fc7e785567b54b6d9e0e0/llvm/lib/CodeGen/RDFLiveness.cpp#L208

This may be causing substantial compilation overhead for some users who enable passes that depend on RDF live variable analysis, e.g., https://github.com/rust-lang/rust/issues/74632.

Related discussion on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141332.html.

kparzysz-quic commented 4 years ago

The patches

https://reviews.llvm.org/rG09897b146a8a https://reviews.llvm.org/rG47fe1b63f449 https://reviews.llvm.org/rGf0f467aeecfc https://reviews.llvm.org/rG4b25f672998f https://reviews.llvm.org/rG9521704553e8

Not all of these are strictly performance-related, but there may be dependencies between them.

kparzysz-quic commented 4 years ago

I committed a handful of patches that get the ~50% reduction in compile-time. Any further improvements would likely require changing the algorithm, but I don't have that much time at the moment (such a change would need to be thoroughly tested).

Now I get

$ time llc bigfunc-lvi.ll real 0m12.970s user 0m12.884s sys 0m0.077s

kparzysz-quic commented 4 years ago

That turned out to be the case... Revised times:

--- master

$ time llc bigfunc.ll real 0m1.832s user 0m1.784s sys 0m0.049s

$ time llc bigfunc-lvi.ll real 0m41.508s user 0m41.406s sys 0m0.101s

--- master + D84471

$ time llc bigfunc.ll real 0m1.811s user 0m1.759s sys 0m0.053s

$ time llc bigfunc-lvi.ll real 0m27.714s user 0m27.645s sys 0m0.069s

--- master + D84471 + RDF wip fixes

$ time llc bigfunc.ll real 0m1.802s user 0m1.765s sys 0m0.037s

$ time llc bigfunc-lvi.ll real 0m15.475s user 0m15.398s sys 0m0.077s

kparzysz-quic commented 4 years ago

I may have expensive checks enabled in my clang build. Let me rebuild everything and try again.

scottconstable commented 4 years ago

I'm surprised by the execution times that you're getting. There is indeed a compile-time increase related to RDF, but in my experiments it's not 16-fold (1.2s -> 20s), but in the order of 15%.

On my machine:

--- master

$ time llc bigfunc.ll real 1m28.141s user 1m21.665s sys 0m6.464s

$ time llc bigfunc-lvi.ll real 2m9.032s user 2m2.475s sys 0m6.549s

--- master + D84471

$ time llc bigfunc.ll real 1m27.341s user 1m20.780s sys 0m6.556s

$ time llc bigfunc-lvi.ll real 1m51.290s user 1m45.120s sys 0m6.168s

--- master + D84471 + RDF wip fixes

$ time llc bigfunc.ll real 1m27.275s user 1m20.714s sys 0m6.548s

$ time llc bigfunc-lvi.ll real 1m40.499s user 1m34.199s sys 0m6.292s

I am even more surprised, since the original bug reporter (https://github.com/rust-lang/rust/issues/74632) recently re-ran his own tests and observed a ~24-fold increase with D84471, following a 180-fold increase without D84471. I am at a total loss to explain how you are observing such a small slowdown. What is the exact LLVM commit that you're building on, so we can compare apples-to-apples?

kparzysz-quic commented 4 years ago

I'm surprised by the execution times that you're getting. There is indeed a compile-time increase related to RDF, but in my experiments it's not 16-fold (1.2s -> 20s), but in the order of 15%.

On my machine:

--- master

$ time llc bigfunc.ll real 1m28.141s user 1m21.665s sys 0m6.464s

$ time llc bigfunc-lvi.ll real 2m9.032s user 2m2.475s sys 0m6.549s

--- master + D84471

$ time llc bigfunc.ll real 1m27.341s user 1m20.780s sys 0m6.556s

$ time llc bigfunc-lvi.ll real 1m51.290s user 1m45.120s sys 0m6.168s

--- master + D84471 + RDF wip fixes

$ time llc bigfunc.ll real 1m27.275s user 1m20.714s sys 0m6.548s

$ time llc bigfunc-lvi.ll real 1m40.499s user 1m34.199s sys 0m6.292s

scottconstable commented 4 years ago

Note that I am also using the patch to fix the performance issue in the LVI mitigations: https://reviews.llvm.org/D84471. So the remaining overhead appears to be caused by RDF.

scottconstable commented 4 years ago

Perf report when using llc to compile bigfunc-lvi.ll:

  22.16%  llc      llc                  [.] std::_Rb_tree<std::pair<unsigned int, llvm::LaneBitmask>, std::pair<unsigned int, llvm::LaneBitmask>, std::_Identity<std::pair<unsigned int, llvm::LaneBitmask> >, std::less<std::pair<unsigned int,
  20.15%  llc      llc                  [.] llvm::rdf::Liveness::computePhiInfo
   9.43%  llc      llc                  [.] llvm::rdf::RegisterAggr::makeRegRef
   7.18%  llc      llc                  [.] llvm::rdf::RegisterAggr::makeRegRef() const::$_0::operator()
   5.47%  llc      libc-2.31.so         [.] malloc
   5.35%  llc      llc                  [.] llvm::BitVector::find_first_in
   5.35%  llc      libc-2.31.so         [.] _int_free
   3.97%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_increment
   3.84%  llc      llc                  [.] llvm::rdf::RegisterAggr::insert
   3.77%  llc      llc                  [.] llvm::rdf::RegisterAggr::clearIn
   1.66%  llc      libc-2.31.so         [.] cfree@GLIBC_2.2.5
   1.21%  llc      libc-2.31.so         [.] __memset_avx2_unaligned_erms
   0.74%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_decrement
   0.39%  llc      llc                  [.] memset@plt
   0.34%  llc      llc                  [.] free@plt
   0.34%  llc      llc                  [.] malloc@plt
   0.33%  llc      llc                  [.] llvm::rdf::PhysicalRegisterInfo::getAliasSet
   0.31%  llc      llc                  [.] llvm::LiveRange::verify
   0.27%  llc      libc-2.31.so         [.] _int_malloc
   0.17%  llc      libstdc++.so.6.0.28  [.] std::_Rb_tree_insert_and_rebalance
   0.13%  llc      llc                  [.] std::_Rb_tree_increment@plt
   0.13%  llc      llc                  [.] llvm::rdf::DataFlowGraph::releaseBlock
   0.13%  llc      llc                  [.] llvm::SelectionDAG::Combine
   0.12%  llc      llc                  [.] std::__detail::_Map_base<unsigned int, std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack>, std::allocator<std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack> >, std::_
   0.10%  llc      llc                  [.] llvm::sys::unicode::columnWidthUTF8
scottconstable commented 4 years ago

Big function compiled from Rust source with LVI mitigations time ~/llvm-project/build/bin/llc bigfunc-lvi.ll

real 0m20.643s user 0m20.612s sys 0m0.028s

scottconstable commented 4 years ago

Big function compiled from Rust source time ~/llvm-project/build/bin/llc bigfunc.ll

real 0m1.241s user 0m1.216s sys 0m0.024s

kparzysz-quic commented 4 years ago

I'm running llc on ScalarEvolution.bc (a leftover file I had from some earlier investigation). It takes about 30 seconds to process this file, and the oprofile results are:

Using /w/test/oprofile_data/samples/ for samples directory.
CPU: Intel Skylake microarchitecture, speed 3000 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
54378     5.9208  llc                      unsigned int llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::runDFS<false, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::verifyParentPro
perty(llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> const&)::{lambda(llvm::MachineBasicBlock*, llvm::MachineBasicBlock*)#1}>(llvm::MachineBasicBlock*, unsigned int, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> const&)::{lambda(llvm::MachineBasicBlock*, llvm::MachineBasicBlock*)#1}, unsigned int)
47477     5.1694  llc                      (anonymous namespace)::MachineVerifier::visitMachineFunctionAfter()
39995     4.3548  llc                      unsigned int llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runDFS<false, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false> const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}>(llvm::BasicBlock*, unsigned int, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false> const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}, unsigned int)
29406     3.2018  llc                      llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >, llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >::FindAndConstruct(llvm::MachineBasicBlock* const&)
27112     2.9520  libc-2.27.so             _int_malloc
26622     2.8987  llc                      llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::ChildrenGetter<false>::Get(llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::BatchUpdateInfo*)
24294     2.6452  llc                      llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >, llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >::FindAndConstruct(llvm::BasicBlock* const&)
24169     2.6316  llc                      bool (anonymous namespace)::VRegFilter::filterAndAdd<llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> > >(llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> > const&, llvm::SmallVectorImpl<unsigned int>&)
23714     2.5821  llc                      checkForCyclesHelper(llvm::SDNode const*, llvm::SmallPtrSetImpl<llvm::SDNode const*>&, llvm::SmallPtrSetImpl<llvm::SDNode const*>&, llvm::SelectionDAG const*)
23583     2.5678  llc                      (anonymous namespace)::MachineVerifier::verify(llvm::MachineFunction&)
23307     2.5377  libc-2.27.so             free
22229     2.4204  llc                      llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::ChildrenGetter<false>::Get(llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::BatchUpdateInfo*)
22059     2.4019  llc                      llvm::SmallPtrSetImpl<llvm::SDNode const*>::insert(llvm::SDNode const*)
21991     2.3944  llc                      llvm::Instruction::getNumSuccessors() const
19755     2.1510  llc                      (anonymous namespace)::MachineVerifier::visitMachineOperand(llvm::MachineOperand const*, unsigned int)
18117     1.9726  llc                      llvm::Instruction::getSuccessor(unsigned int) const
15364     1.6729  llc                      (anonymous namespace)::MachineVerifier::visitMachineBundleAfter(llvm::MachineInstr const*)
15150     1.6496  llc                      (anonymous namespace)::details::StructuralHash::update(llvm::Function&)
14762     1.6073  libc-2.27.so             malloc
13687     1.4903  llc                      void llvm::SmallVectorImpl<llvm::BasicBlock*>::append<std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >, void>(std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >, std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >)
11663     1.2699  llc                      llvm::detail::DenseSetPair<unsigned int>* llvm::DenseMapBase<llvm::DenseMap<unsigned int, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseSetPair<unsigned int> >, unsigned int, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseSetPair<unsigned int> >::InsertIntoBucketImpl<unsigned int>(unsigned int const&, unsigned int const&, llvm::detail::DenseSetPair<unsigned int>*)
11226     1.2223  llc                      llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >, llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >::clear()
10043     1.0935  llc                      llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >, llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >::clear()
8552      0.9312  llc                      llvm::BasicBlock::getTerminator() const
7664      0.8345  llc                      llvm::rdf::PhysicalRegisterInfo::getAliasSet(unsigned int) const
[...]

The first RDF-related symbol is at the bottom with 0.8% of the entire execution time (this is not the entire list). Do you have a file that you could share where the RDF liveness is taking a significant time? I have some improvements that should reduce the complexity of the ordering stage, but I'm not seeing any difference with the files I tried.

scottconstable commented 4 years ago

For Rust, I'm not 100% certain, as I have not personally compiled any rust code with the LVI hardening.

For clang/clang++, it suffices to compile with the -mlvi-hardening flag.

kparzysz-quic commented 4 years ago

Is it enough to set "--x86-experimental-lvi-inline-asm-hardening", and "target-feature=+lvi-cfi,+lvi-load-hardening" to get the LVI pass to run?

scottconstable commented 4 years ago

assigned to @kparzysz-quic