Ballooning compile time with LVI mitigations

jberci commented 4 years ago

There seems to be a significant compile time regression when using the LVI mitigation passes on some crates. The smallest case I could come up with is:

#[macro_use] extern crate crunchy;
#[macro_use] extern crate uint;
//construct_uint!(UINT, 1);
//construct_uint!(UINT, 2);
//construct_uint!(UINT, 4);
//construct_uint!(UINT, 8);
construct_uint!(UINT, 16);
fn main() {
    let a = UINT::from_dec_str("10").unwrap();
    let b = UINT::from_dec_str("20").unwrap();
    println!("a * b = {}", ((a * b - b ) / a).as_u32());
}

TOML:

[package]
name = "lvitest"
version = "0.1.0"
edition = "2018"
[dependencies]
crunchy = { version = "0.1" }
uint = { version = "0.2.1" }

This happens with the x86_64-fortanix-unknown-sgx target, but also without it with the following codegen options (.cargo/config.toml):

[build]
rustflags = ["-C", "link-dead-code", "-C", "llvm-args=--x86-experimental-lvi-inline-asm-hardening", "-C", "target-feature=+lvi-cfi,+lvi-load-hardening"]

Expected: Without the config.toml on a vanilla rustup-installed nightly, a cargo clean && cargo build takes a second or two.

Instead: With LVI mitigations enabled, compile time explodes to about 35 minutes on my machine.

If I disable linking dead code, it drops to 2 minutes, although that's mostly because the sample above is simple. Using shorter uints (e.g. the ones commented out above) makes it drop further still, but compile time still noticeably depends on the uint length.

Using a newer version of the uint crate (e.g. 0.4.1) also seems to solve the issue. The main difference that I could find is that the older version uses inline assembly. Not sure if that's a red herring or not.

Meta

rustc --version --verbose:

rustc 1.47.0-nightly (8ad7bc3f4 2020-07-21)
binary: rustc
commit-hash: 8ad7bc3f428300aee6764f6e23527e19eb235e81
commit-date: 2020-07-21
host: x86_64-unknown-linux-gnu
release: 1.47.0-nightly
LLVM version: 10.0

jonas-schievink commented 4 years ago

These are LLVM passes. Is there any reason to believe this is a rustc bug and not an LLVM bug?

jberci commented 4 years ago

I wouldn't say so, but a rust reproducer is the only one I have and I couldn't find a way to open an issue on the rust-lang/llvm-project fork. Apologies if I missed it.

nikic commented 4 years ago

cc @jethrogb

jethrogb commented 4 years ago

cc @scottconstable @raoulstrackx

scottconstable commented 4 years ago

Are these crates being compiled with the MIP optimization plugin?

https://github.com/intel/lvi-llvm-optimization-plugin

jethrogb commented 4 years ago

Probably not

scottconstable commented 4 years ago

Then I think the culprit is probably this issue in CodeGen/RDF that I reported back in April: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141332.html. Looks like it hasn't been fixed yet. I'll open up a bug report and ping Krzysztof.

scottconstable commented 4 years ago

Bug report created: https://bugs.llvm.org/show_bug.cgi?id=46808. Hopefully this will address the issue reported by @jberci

jethrogb commented 4 years ago

I've reduced the example from the initial post to one file without external dependencies: main.rs

$ time rustc main.rs
real    0m2.223s
user    0m2.115s
sys 0m0.107s
$ time rustc -C llvm-args=--x86-experimental-lvi-inline-asm-hardening -C target-feature=+lvi-cfi,+lvi-load-hardening main.rs
real    1m47.199s
user    1m47.015s
sys 0m0.158s

scottconstable commented 4 years ago

@jethrogb Would it be possible to run that example with Linux perf to see which function is responsible for adding all those cycles?

jethrogb commented 4 years ago

    63.13%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Function_handler<void (llvm::ImmutableGraph<llvm::MachineInstr*, int>::Node const*, bool), (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes((anonymous namespace)::MachineGadgetGraph&, llvm::ImmutableGraph<llvm::MachineInstr*, int>::EdgeSet&, llvm::ImmutableGraph<llvm::MachineInstr*, int>::NodeSet&) const::$_3>::_M_invoke
     7.65%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Rb_tree<std::pair<unsigned int, llvm::LaneBitmask>, std::pair<unsigned int, llvm::LaneBitmask>, std::_Identity<std::pair<unsigned int, llvm::LaneBitmask> >, std::less<std::pair<unsigned int, llvm::LaneBitmask> >, std::allocator<std::pair<unsigned int, llvm::LaneBitmask> > >::_M_insert_unique<std::pair<unsigned int, llvm::LaneBitmask> >
     6.35%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::makeRegRef
     5.97%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::Liveness::computePhiInfo
     2.84%  rustc     rustc                                [.] malloc
     2.55%  rustc     rustc                                [.] free
     1.46%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] std::_Rb_tree_increment
     1.42%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::insert
     1.32%  rustc     libc-2.23.so                         [.] __memset_avx2
     1.19%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] llvm::rdf::RegisterAggr::clearIn
     0.96%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::runOnMachineFunction
     0.85%  rustc     librustc_driver-582725d49f41b219.so  [.] rustc_data_structures::obligation_forest::ObligationForest<O>::process_obligations
     0.61%  rustc     libLLVM-10-rust-1.46.0-nightly.so    [.] (anonymous namespace)::X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes

scottconstable commented 4 years ago

So it looks like ~63% is due to the LVI load hardening pass (specifically, the greedy heuristic for inserting LFENCEs), and another ~24% comes from RDF. I take it that Rust probably does a lot of aggressive inlining and LTO, and the function being mitigated is just absolutely huge?

jethrogb commented 4 years ago

I take it that Rust probably does a lot of aggressive inlining and LTO, and the function being mitigated is just absolutely huge?

Yes to all of the above. Most of the 5000-line file is one function. Is it unreasonable to expect such a case to ever be fast with LVI mitigations?

scottconstable commented 4 years ago

I think it should still be possible to have fast compilation for very large functions. I'm curious about the observation that a newer uint crate seems to solve the issue, as @jberci had observed. Does the updated crate also contain a 5,000 LoC function? There might be something more subtle going on.

scottconstable commented 4 years ago

I created a patch that should fix the overhead caused by X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes() https://reviews.llvm.org/D84471

A complete solution will also require a patch to address the complexity issues in RDF.

jethrogb commented 4 years ago

@jberci would you be able to test your build with Scott's patch?

jberci commented 4 years ago

Definitely. I built a plain rustc from commit 8ad7bc3f428 (mostly to check if I'm even doing it right) and another one with the patch from D84471 applied on top of that.

For uint 0.2.1, the plain one took around half an hour to compile my sample, as before, and the one with D84471 applied took just under 4 minutes. (And disabling the LVI passes brings that down to 10s). With the newer 0.4 version of uint, both rustc builds behave about the same (~45s with LVI passes, ~25s without). I'm still working on building the original project that surfaced this with the patched rustc, but from the rest of what I could test so far locally, the effect should be the same as for the sample here.

scottconstable commented 4 years ago

To summarize where we stand now, we have committed a patch to address the slowdown in the LVI LFENCE insertion algorithm. Krzysztof has committed several patches on the RDF side that reduce but may not completely eliminate the overhead for live variables analysis (see https://bugs.llvm.org/show_bug.cgi?id=46808). Maybe it would be good to take stock of where we stand right now with the compile time overhead. If the overhead remains too high, then we may need to do some more involved reworking of some of the RDF algorithms to make them more efficient.

jethrogb commented 4 years ago

Ok we now have a Rust nightly that includes the patches from @scottconstable and Krzysztof. @jberci are you happy with the compile times now?

nikic commented 2 years ago

As far as I can tell, this has been resolved, so closing the issue. Please let me know if this is not the case.

rust-lang / rust

Ballooning compile time with LVI mitigations #74632

Meta