Open scottconstable opened 4 years ago
The patches
https://reviews.llvm.org/rG09897b146a8a https://reviews.llvm.org/rG47fe1b63f449 https://reviews.llvm.org/rGf0f467aeecfc https://reviews.llvm.org/rG4b25f672998f https://reviews.llvm.org/rG9521704553e8
Not all of these are strictly performance-related, but there may be dependencies between them.
I committed a handful of patches that get the ~50% reduction in compile-time. Any further improvements would likely require changing the algorithm, but I don't have that much time at the moment (such a change would need to be thoroughly tested).
Now I get
$ time llc bigfunc-lvi.ll real 0m12.970s user 0m12.884s sys 0m0.077s
That turned out to be the case... Revised times:
--- master
$ time llc bigfunc.ll real 0m1.832s user 0m1.784s sys 0m0.049s
$ time llc bigfunc-lvi.ll real 0m41.508s user 0m41.406s sys 0m0.101s
--- master + D84471
$ time llc bigfunc.ll real 0m1.811s user 0m1.759s sys 0m0.053s
$ time llc bigfunc-lvi.ll real 0m27.714s user 0m27.645s sys 0m0.069s
--- master + D84471 + RDF wip fixes
$ time llc bigfunc.ll real 0m1.802s user 0m1.765s sys 0m0.037s
$ time llc bigfunc-lvi.ll real 0m15.475s user 0m15.398s sys 0m0.077s
I may have expensive checks enabled in my clang build. Let me rebuild everything and try again.
I'm surprised by the execution times that you're getting. There is indeed a compile-time increase related to RDF, but in my experiments it's not 16-fold (1.2s -> 20s), but in the order of 15%.
On my machine:
--- master
$ time llc bigfunc.ll real 1m28.141s user 1m21.665s sys 0m6.464s
$ time llc bigfunc-lvi.ll real 2m9.032s user 2m2.475s sys 0m6.549s
--- master + D84471
$ time llc bigfunc.ll real 1m27.341s user 1m20.780s sys 0m6.556s
$ time llc bigfunc-lvi.ll real 1m51.290s user 1m45.120s sys 0m6.168s
--- master + D84471 + RDF wip fixes
$ time llc bigfunc.ll real 1m27.275s user 1m20.714s sys 0m6.548s
$ time llc bigfunc-lvi.ll real 1m40.499s user 1m34.199s sys 0m6.292s
I am even more surprised, since the original bug reporter (https://github.com/rust-lang/rust/issues/74632) recently re-ran his own tests and observed a ~24-fold increase with D84471, following a 180-fold increase without D84471. I am at a total loss to explain how you are observing such a small slowdown. What is the exact LLVM commit that you're building on, so we can compare apples-to-apples?
I'm surprised by the execution times that you're getting. There is indeed a compile-time increase related to RDF, but in my experiments it's not 16-fold (1.2s -> 20s), but in the order of 15%.
On my machine:
--- master
$ time llc bigfunc.ll real 1m28.141s user 1m21.665s sys 0m6.464s
$ time llc bigfunc-lvi.ll real 2m9.032s user 2m2.475s sys 0m6.549s
--- master + D84471
$ time llc bigfunc.ll real 1m27.341s user 1m20.780s sys 0m6.556s
$ time llc bigfunc-lvi.ll real 1m51.290s user 1m45.120s sys 0m6.168s
--- master + D84471 + RDF wip fixes
$ time llc bigfunc.ll real 1m27.275s user 1m20.714s sys 0m6.548s
$ time llc bigfunc-lvi.ll real 1m40.499s user 1m34.199s sys 0m6.292s
Note that I am also using the patch to fix the performance issue in the LVI mitigations: https://reviews.llvm.org/D84471. So the remaining overhead appears to be caused by RDF.
Perf report when using llc to compile bigfunc-lvi.ll:
22.16% llc llc [.] std::_Rb_tree<std::pair<unsigned int, llvm::LaneBitmask>, std::pair<unsigned int, llvm::LaneBitmask>, std::_Identity<std::pair<unsigned int, llvm::LaneBitmask> >, std::less<std::pair<unsigned int,
20.15% llc llc [.] llvm::rdf::Liveness::computePhiInfo
9.43% llc llc [.] llvm::rdf::RegisterAggr::makeRegRef
7.18% llc llc [.] llvm::rdf::RegisterAggr::makeRegRef() const::$_0::operator()
5.47% llc libc-2.31.so [.] malloc
5.35% llc llc [.] llvm::BitVector::find_first_in
5.35% llc libc-2.31.so [.] _int_free
3.97% llc libstdc++.so.6.0.28 [.] std::_Rb_tree_increment
3.84% llc llc [.] llvm::rdf::RegisterAggr::insert
3.77% llc llc [.] llvm::rdf::RegisterAggr::clearIn
1.66% llc libc-2.31.so [.] cfree@GLIBC_2.2.5
1.21% llc libc-2.31.so [.] __memset_avx2_unaligned_erms
0.74% llc libstdc++.so.6.0.28 [.] std::_Rb_tree_decrement
0.39% llc llc [.] memset@plt
0.34% llc llc [.] free@plt
0.34% llc llc [.] malloc@plt
0.33% llc llc [.] llvm::rdf::PhysicalRegisterInfo::getAliasSet
0.31% llc llc [.] llvm::LiveRange::verify
0.27% llc libc-2.31.so [.] _int_malloc
0.17% llc libstdc++.so.6.0.28 [.] std::_Rb_tree_insert_and_rebalance
0.13% llc llc [.] std::_Rb_tree_increment@plt
0.13% llc llc [.] llvm::rdf::DataFlowGraph::releaseBlock
0.13% llc llc [.] llvm::SelectionDAG::Combine
0.12% llc llc [.] std::__detail::_Map_base<unsigned int, std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack>, std::allocator<std::pair<unsigned int const, llvm::rdf::DataFlowGraph::DefStack> >, std::_
0.10% llc llc [.] llvm::sys::unicode::columnWidthUTF8
Big function compiled from Rust source with LVI mitigations time ~/llvm-project/build/bin/llc bigfunc-lvi.ll
real 0m20.643s user 0m20.612s sys 0m0.028s
Big function compiled from Rust source time ~/llvm-project/build/bin/llc bigfunc.ll
real 0m1.241s user 0m1.216s sys 0m0.024s
I'm running llc on ScalarEvolution.bc (a leftover file I had from some earlier investigation). It takes about 30 seconds to process this file, and the oprofile results are:
Using /w/test/oprofile_data/samples/ for samples directory.
CPU: Intel Skylake microarchitecture, speed 3000 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples % image name symbol name
54378 5.9208 llc unsigned int llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::runDFS<false, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::verifyParentPro
perty(llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> const&)::{lambda(llvm::MachineBasicBlock*, llvm::MachineBasicBlock*)#1}>(llvm::MachineBasicBlock*, unsigned int, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> const&)::{lambda(llvm::MachineBasicBlock*, llvm::MachineBasicBlock*)#1}, unsigned int)
47477 5.1694 llc (anonymous namespace)::MachineVerifier::visitMachineFunctionAfter()
39995 4.3548 llc unsigned int llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::runDFS<false, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false> const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}>(llvm::BasicBlock*, unsigned int, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::verifyParentProperty(llvm::DominatorTreeBase<llvm::BasicBlock, false> const&)::{lambda(llvm::BasicBlock*, llvm::BasicBlock*)#1}, unsigned int)
29406 3.2018 llc llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >, llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >::FindAndConstruct(llvm::MachineBasicBlock* const&)
27112 2.9520 libc-2.27.so _int_malloc
26622 2.8987 llc llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::ChildrenGetter<false>::Get(llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::BatchUpdateInfo*)
24294 2.6452 llc llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >, llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >::FindAndConstruct(llvm::BasicBlock* const&)
24169 2.6316 llc bool (anonymous namespace)::VRegFilter::filterAndAdd<llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> > >(llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> > const&, llvm::SmallVectorImpl<unsigned int>&)
23714 2.5821 llc checkForCyclesHelper(llvm::SDNode const*, llvm::SmallPtrSetImpl<llvm::SDNode const*>&, llvm::SmallPtrSetImpl<llvm::SDNode const*>&, llvm::SelectionDAG const*)
23583 2.5678 llc (anonymous namespace)::MachineVerifier::verify(llvm::MachineFunction&)
23307 2.5377 libc-2.27.so free
22229 2.4204 llc llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::ChildrenGetter<false>::Get(llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::BatchUpdateInfo*)
22059 2.4019 llc llvm::SmallPtrSetImpl<llvm::SDNode const*>::insert(llvm::SDNode const*)
21991 2.3944 llc llvm::Instruction::getNumSuccessors() const
19755 2.1510 llc (anonymous namespace)::MachineVerifier::visitMachineOperand(llvm::MachineOperand const*, unsigned int)
18117 1.9726 llc llvm::Instruction::getSuccessor(unsigned int) const
15364 1.6729 llc (anonymous namespace)::MachineVerifier::visitMachineBundleAfter(llvm::MachineInstr const*)
15150 1.6496 llc (anonymous namespace)::details::StructuralHash::update(llvm::Function&)
14762 1.6073 libc-2.27.so malloc
13687 1.4903 llc void llvm::SmallVectorImpl<llvm::BasicBlock*>::append<std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >, void>(std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >, std::__1::reverse_iterator<llvm::SuccIterator<llvm::Instruction, llvm::BasicBlock> >)
11663 1.2699 llc llvm::detail::DenseSetPair<unsigned int>* llvm::DenseMapBase<llvm::DenseMap<unsigned int, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseSetPair<unsigned int> >, unsigned int, llvm::detail::DenseSetEmpty, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseSetPair<unsigned int> >::InsertIntoBucketImpl<unsigned int>(unsigned int const&, unsigned int const&, llvm::detail::DenseSetPair<unsigned int>*)
11226 1.2223 llc llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >, llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::InfoRec> >::clear()
10043 1.0935 llc llvm::DenseMapBase<llvm::DenseMap<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >, llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec, llvm::DenseMapInfo<llvm::MachineBasicBlock*>, llvm::detail::DenseMapPair<llvm::MachineBasicBlock*, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::MachineBasicBlock, false> >::InfoRec> >::clear()
8552 0.9312 llc llvm::BasicBlock::getTerminator() const
7664 0.8345 llc llvm::rdf::PhysicalRegisterInfo::getAliasSet(unsigned int) const
[...]
The first RDF-related symbol is at the bottom with 0.8% of the entire execution time (this is not the entire list). Do you have a file that you could share where the RDF liveness is taking a significant time? I have some improvements that should reduce the complexity of the ordering stage, but I'm not seeing any difference with the files I tried.
For Rust, I'm not 100% certain, as I have not personally compiled any rust code with the LVI hardening.
For clang/clang++, it suffices to compile with the -mlvi-hardening
flag.
Is it enough to set "--x86-experimental-lvi-inline-asm-hardening", and "target-feature=+lvi-cfi,+lvi-load-hardening" to get the LVI pass to run?
assigned to @kparzysz-quic
Extended Description
The RDF live variables analysis probably has high asymptotic complexity than what is actually necessary. In particular:
https://github.com/llvm/llvm-project/blob/b5be1c5419e2a38eb60fc7e785567b54b6d9e0e0/llvm/lib/CodeGen/RDFLiveness.cpp#L208
This may be causing substantial compilation overhead for some users who enable passes that depend on RDF live variable analysis, e.g., https://github.com/rust-lang/rust/issues/74632.
Related discussion on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141332.html.