Closed zonyitoo closed 6 years ago
Is that a git bisect result for the commit id or something else, it's a little unclear from the comments here. If not then I would recommend running git bisect on the nightly build that's failing and report it back. After that me and others can figure out if reverted is possible or works cleanly. If not then a fix should be applied based on what git bisect returns.
I found this problem after rustup update
. So I don't remember which exact commit I was on before update.
It would be the last version before Nov. 1. I guess.
Actually you can run git bisect on the git repo here: https://github.com/rust-lang/rust/commits/master. As it's up to date. Just clone it and build it. Actually you don't need commit ids, that makes it run faster a few less bisections. If you run git bisect bad or good without commit for both, it runs it against the complete history rather than a range of commits.
Last few lines of SODIUM_BUILD_STATIC=yes cargo rustc --lib --release -- --emit=llvm-ir -Ztime-passes -Ztime-llvm-passes -Zprint-llvm-passes
are below. It might be due to ThinLTO being enabled by default perhaps (purely based on -write-thinlto-bitcode
? cc @alexcrichton
@xerofoify Generally running git bisect
isn't too useful in most cases due to how slow that'll be. It's probably easier to bisect using the CI artifacts, which I'll attempt to do myself soon. We try to diagnose the problem ourselves before asking reporters to do so since it's likely our bug.
time: 0.055; rss: 193MB write metadata
time: 3.273; rss: 224MB translation item collection
time: 0.017; rss: 232MB codegen unit partitioning
time: 4.273; rss: 326MB translate to LLVM IR
time: 0.000; rss: 326MB assert dep graph
time: 0.000; rss: 326MB serialize dep graph
time: 7.792; rss: 326MB translation
time: 0.772; rss: 284MB llvm function passes [shadowsocks0]
time: 90.584; rss: 6616MB llvm module passes [shadowsocks0]
Pass Arguments: -profile-summary-info -module-summary-analysis -write-thinlto-bitcode
Profile summary info
ModulePass Manager
Module Summary Analysis
Unnamed pass: implement Pass::getPassName()
ThinLTO Bitcode Writer
Pass Arguments: -domtree -loops -branch-prob -block-freq
FunctionPass Manager
Dominator Tree Construction
Natural Loop Information
Branch Probability Analysis
Block Frequency Analysis
Sorry about that, wasn't sure what the protocol for git bisect was here :).
Looks like this was caused by #45225 (cc @eddyb). I suspect that LLVM is struggling with the IR we're handing it...
@Mark-Simulacrum I don't understand the log, is LLVM just slower, or does it end up in some loop? I feel like we've seen one such bug before but I can't remember which one it was.
It's possible it's just slower, I didn't try looking at what was happening with gdb or anything. The memory usage did seem to keep growing so it presumably as doing something....
It will hang forever, but still consums CPU.
ps shows a rustc process:
zonyitoo 14287 98.3 0.0 25306648 1012 s001 R+ 10:36下午 17:20.50 /Users/zonyitoo/.rustup/toolchains/nightly-x86_64-apple-darwin/bin/rustc --crate-name shadowsocks
Is what the log states. We should check if the build for rust supports it this occurs under gcc as it if does then we have a issue with us and the compiler.
I am curious what is meant by this through: avoiding LLVM's "first-class aggregates"
in this commit, https://github.com/rust-lang/rust/pull/45225?
The error only happens when optimization level ≥2. Number of CGUs (ThinLTO) is irrelevant. Most time is spent on SROALegacyPass
I've enabled some debug log for the SROA pass and looks like there's difficulty working with futures::future::AndThen::poll
. SROA-ing every such call takes nearly 2 minutes, and there are at least 28 futures-related and_then
s in the shadowsocks-rust package…
[00:00:00] Compiling shadowsocks-rust v1.6.10 (file://$DIR/shadowsocks-rust)
[00:00:00] Running `rustc --crate-name shadowsocks src/lib.rs --crate-type lib --emit=dep-info,link -C opt-level=3 --cfg 'feature="default"' --cfg 'feature="libsodium-ffi"' --cfg 'feature="sodium"' -C metadata=875e43cd17c0b5a0 -C extra-filename=-875e43cd17c0b5a0 --out-dir $DIR/shadowsocks-rust/target/release/deps -L dependency=$DIR/shadowsocks-rust/target/release/deps --extern tokio_signal=$DIR/shadowsocks-rust/target/release/deps/libtokio_signal-d8d02ef07068b1ca.rlib --extern subprocess=$DIR/shadowsocks-rust/target/release/deps/libsubprocess-3c1ecbd141d33dff.rlib --extern openssl=$DIR/shadowsocks-rust/target/release/deps/libopenssl-1b60528d713e083d.rlib --extern bytes=$DIR/shadowsocks-rust/target/release/deps/libbytes-7b105104e2fe57f0.rlib --extern md_5=$DIR/shadowsocks-rust/target/release/deps/libmd_5-f4490bb4d711237a.rlib --extern lazy_static=$DIR/shadowsocks-rust/target/release/deps/liblazy_static-dd36e56c4b5b6e36.rlib --extern tokio_core=$DIR/shadowsocks-rust/target/release/deps/libtokio_core-cd674c1cee8f4b35.rlib --extern scoped_tls=$DIR/shadowsocks-rust/target/release/deps/libscoped_tls-2741372f6d09b8f2.rlib --extern ring=$DIR/shadowsocks-rust/target/release/deps/libring-28456fadef0887d9.rlib --extern libsodium_ffi=$DIR/shadowsocks-rust/target/release/deps/liblibsodium_ffi-3c35aa1af33659fd.rlib --extern libc=$DIR/shadowsocks-rust/target/release/deps/liblibc-187fc6c13f99925e.rlib --extern byte_string=$DIR/shadowsocks-rust/target/release/deps/libbyte_string-f2e9086bfb1da425.rlib --extern qrcode=$DIR/shadowsocks-rust/target/release/deps/libqrcode-8986450b39a075af.rlib --extern url=$DIR/shadowsocks-rust/target/release/deps/liburl-4b2c5f911212381c.rlib --extern serde_urlencoded=$DIR/shadowsocks-rust/target/release/deps/libserde_urlencoded-25effef210302280.rlib --extern serde_json=$DIR/shadowsocks-rust/target/release/deps/libserde_json-09584eb04dca168e.rlib --extern digest=$DIR/shadowsocks-rust/target/release/deps/libdigest-faeede511d72a534.rlib --extern typenum=$DIR/shadowsocks-rust/target/release/deps/libtypenum-0ea64efc715818d9.rlib --extern time=$DIR/shadowsocks-rust/target/release/deps/libtime-9a37c4925e6970b3.rlib --extern clap=$DIR/shadowsocks-rust/target/release/deps/libclap-a8b942b4941c7197.rlib --extern env_logger=$DIR/shadowsocks-rust/target/release/deps/libenv_logger-3c47b9db34b73bb5.rlib --extern byteorder=$DIR/shadowsocks-rust/target/release/deps/libbyteorder-390f9f29da931bec.rlib --extern tokio_io=$DIR/shadowsocks-rust/target/release/deps/libtokio_io-14bba7a8919ac847.rlib --extern futures=$DIR/shadowsocks-rust/target/release/deps/libfutures-5b90b645077d9497.rlib --extern log=$DIR/shadowsocks-rust/target/release/deps/liblog-820009db7805861d.rlib --extern base64=$DIR/shadowsocks-rust/target/release/deps/libbase64-e45e06e9dc086d37.rlib --extern futures_cpupool=$DIR/shadowsocks-rust/target/release/deps/libfutures_cpupool-bb8344cc22845bf3.rlib --extern rand=$DIR/shadowsocks-rust/target/release/deps/librand-0b5438783367eeba.rlib -L native=/usr/local/Cellar/libsodium/1.0.15/lib -L native=/usr/local/opt/openssl/lib -L native=$DIR/shadowsocks-rust/target/release/build/ring-b4c40f80f2d65dd1/out`
[00:00:50] SROA function: _ZN100_$LT$$RF$$u27$a$u20$mut$u20$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..iter..traits..IntoIterator$GT$9into_iter17h29e23af86a7d2c85E
...
<skipped 17832 lines>
...
[00:01:10] SROA function: _ZN65_$LT$futures..future..chain..Chain$LT$A$C$$u20$B$C$$u20$C$GT$$GT$4poll17h05ca77dd1cc1739aE
[00:01:10] Promoting allocas with mem2reg...
[00:01:10] SROA function: _ZN105_$LT$futures..future..and_then..AndThen$LT$A$C$$u20$B$C$$u20$F$GT$$u20$as$u20$futures..future..Future$GT$4poll17hca25b69dd59146a1E
[00:02:30] Promoting allocas with mem2reg...
[00:03:02] SROA function: _ZN4core3mem7size_of17h9931e69b61525598E
[00:03:02] SROA function: _ZN4core3ptr19swap_nonoverlapping17h48ec9f255a445c1bE
[00:03:02] SROA function: _ZN4core3mem4swap17h58bbc82a017d5836E
...
<skipped 86 lines>
...
[00:03:03] SROA function: _ZN65_$LT$futures..future..chain..Chain$LT$A$C$$u20$B$C$$u20$C$GT$$GT$4poll17h6788e14405a2ffadE
[00:03:03] Promoting allocas with mem2reg...
[00:03:03] SROA function: _ZN105_$LT$futures..future..and_then..AndThen$LT$A$C$$u20$B$C$$u20$F$GT$$u20$as$u20$futures..future..Future$GT$4poll17h466afb4be6fc7f15E
[00:04:16] Promoting allocas with mem2reg...
[00:04:59] SROA function: _ZN4core3mem7size_of17he136b9e66cd6b59bE
[00:04:59] SROA function: _ZN4core3ptr19swap_nonoverlapping17hdeab54ab4295f4f6E
[00:04:59] SROA function: _ZN4core3mem4swap17h82cdf9db11f129e6E
...
Ok at least from my knowledge this is possibly a issue with LLVM in optimizations if it's failing like that. What version of LLVM is being used here. Maybe a newer version fixes it from upstream already?
I don't know if OSX is different, but I get this instead:
rustc: /home/eddy/Projects/rust-2/src/llvm/lib/Support/SmallVector.cpp:36:
void llvm::SmallVectorBase::grow_pod(void *, size_t, size_t):
Assertion `NewElts && "Out of memory"' failed.
LLVM assertions in nightlies are disabled, so normal users won't see this...
We should do bisections with the nightlies that have the assertions enabled (cc @rust-lang/infra). Also, if it does run out of memory, is the hang some kind of UB result of that?
Backtrace for that OOM (maybe 16GB just aren't enough?):
EDIT: htop
shows the process going to 16.3G VIRT 15.5G RES
with about 16G
free.
If I change the crate type to
EDIT: without changing the crate type, just disabling LTO and adding staticlib
then it compiles pretty quickly for me.--emit=obj
works for me.
I'm marking it O-macos for now, as just disabling LTO and adding --emit=obj
does not work for me.
Edit: Not an -alt build, just regular nightly.
So there are multiple bugs here? Have you been trying the -alt
build?
@eddyb I found one part which is highly relevant to #45225. In src/relay/tcprelay/mod.rs
there is a constant
const BUFFER_SIZE: usize = 8 * 1024; // 8K buffer
which is used in several structures, e.g.
pub struct CopyTimeout<R, W>
where
R: AsyncRead,
W: AsyncWrite,
{
r: Option<R>,
w: Option<W>,
timeout: Duration,
amt: u64,
timer: Option<Timeout>,
buf: [u8; BUFFER_SIZE], // <----
pos: usize,
cap: usize,
}
Reducing the BUFFER_SIZE
from 8 KiB allows the compilation to finish (quoted is the timing for the "llvm module passes" pass):
Size | Stable | Beta | Nightly |
---|---|---|---|
1 | 22s | 20s | 20s |
1024 | 20s | 21s | 20s |
2048 | 20s | 24s | 24s |
3072 | 20s | 30s | 30s |
4096 | 20s | 42s | 40s |
5120 | 20s | 61s | 58s |
6144 | 20s | 84s | 82s |
Note that the time spent in stable is constant, but that for beta and nightly is superlinear.
I'm still reducing the program but this structure (together with some interactions with Futures) should be one contribution of the cause that hangs LLVM and eats up all the memory.
Is it used with enums or directly in argument/return types? LLVM shouldn't be SROA-ing arrays, but I don't know the actual mechanisms involved :/.
@eddyb Both.
As return type: https://github.com/shadowsocks/shadowsocks-rust/blob/5663623f59fb14f11d995b52f7a3fc1fc25dbd35/src/relay/tcprelay/crypto_io.rs#L40
As enum variant: https://github.com/shadowsocks/shadowsocks-rust/blob/5663623f59fb14f11d995b52f7a3fc1fc25dbd35/src/relay/tcprelay/utils.rs#L167-L171
It's possible CopyTimeoutOpt
(which would be represented like an union) triggers some edge case in LLVM - we used to use different filler types for unions but now it's all bytes.
Are you able to give us the output of the assembly itself for what you stated may be causing the problem. It seems that SROAing may happen for arrays or objects. The well-known scalar replacement of aggregates transformation. This transform breaks up alloca instructions of aggregate type (structure or array) into individual alloca instructions for each member if possible. Then, if possible, it transforms the individual alloca instructions into nice clean scalar SSA form.
Half reduced:
EDIT: Disregard, I haven't compared this with stable branch. It maybe just inherent slowness irrelevant to the regression.
Same problem in Arch Linux 64bit.
rustc 1.24.0-nightly (ad3543db3 2017-12-08) LLVM 5.0.0-1
Someone should try under gcc if the build system supports it to use if the regression is caused by current versions of LLVM and not gcc.
o_O rustc doesn’t support a gcc backend yet
That makes more sense than why that wasn't already tested. Is it possible to disassemble the running code to use what LLVM is doing internally as if so then it may be possible to prove that the issue is instead with SROA optimizations.
Truly reduced (except the futures
dependency), extremely fast on stable and very slow on beta/nightly. I've also verified this on playground, so it is indeed not platform-specific.
The timing of stable is constant while that of beta is clearly exponential.
Note that the timing is sensitive on the Error
type as well. The following shows the timing for items with different Error
types, sometimes beta is much better than stable, sometimes both of them are equally bad, probably due to how the fields are rearranged between the two versions.
We can't draw much conclusion before further eliminating the futures
dependency.
Edit: Free of futures
dependency.
probably due to how the fields are rearranged between the two versions.
There's nothing that changed field order btw, if you're seeing a significant boost it's probably from getting some enums optimized in a way they couldn't before.
@kennytm Thanks a lot for doing this, I'll try reproducing and if I can, I can start experimenting with bringing what we give LLVM more in line with what was before and see what helps.
EDIT: hunch seems to have been right, #46623 restores the alignment filling and SROA speed.
~~While playing with reductions I ended up at https://play.rust-lang.org/?gist=f6991a7742eca275395ed34c7b29929f&version=stable - which... crashes(?), somehow, on stable? I really wish cargo would just print exit codes, especially when they correspond to a signal.~~
EDIT: looks like compilation timeout isn't nicely messaged in the playground. Got to single-crate: https://play.rust-lang.org/?gist=35f2ee1ea315576c80b8988538a94078&version=stable
I have a testcase that's problematic on stable as well (I believe it amounts to an enum optimization happening on stable, as it's a simpler case), how should we proceed?
AFAICT, this is a long-standing problem and the regression I caused only triggers it in more cases.
EDIT: still reducing, but I've updated with a version that is slow on stable, beta, and #46623.
EDIT2: I believe I have reached minimal reduction, with both safe enums and unsafe unions:
EDIT3: I've put the union
testcase through bugpoint and that reproduces with opt -sroa
.
cc @rust-lang/compiler Should we untag as a regression from stable to beta given that this is a longstanding issue in LLVM? Either way we should separate the LLVM bug and me triggering it.
@serge-sans-paille in LLVM IRC came up with this patch https://reviews.llvm.org/D41296, which solves at least the quadratic slowdown caused by SROA trying to generate fresh value names.
EDIT: it makes one testcase go from 128s
and 9.8GB
to 4.6s
and 248MB
.
EDIT2: the original crate finishes a full rebuild in 363.77s
(for comparison, 1.21
does it in 209.66s
).
https://reviews.llvm.org/D41296 is still unreviewed. Does someone know who to poke?
cc @alexcrichton Should we apply this patch to our LLVM on beta?
EDIT: The layout PR has been taken out of the current beta, for now, and the LLVM patch is now getting reviewed (and generalized beyond SROA).
If you don't need to debug the value names at the IR level, you can set a flag on the LLVMContext to discard them automatically, it would much improve the performance. For instance clang sets this flag by default in Release mode.
I believe this should be fixed by #46925 .
Should we trigger a beta .3
release (presumably in a manner similar to that of #46580), so that people can actually test out all these various fixes via rustup
?
triage: P-high
I'm bumping this up to P-high -- it affects the upcoming beta. @eddyb can you give us a status update of the LLVM bug?
We will want to land the LLVM patch into our fork either way (we haven’t yet upgraded to our own LLVM to 5.0 yet, IIRC and this would land against 6.0 upstream).
That being said, disabling name generation in LLVM when --emit=llvm-ir
is not requested, sounds like a great idea nevertheless.
Using LLVM context configured to not retain any names brings the compilation time back to a fairly reasonable
Finished release [optimized] target(s) in 98.65 secs
compared to
Finished release [optimized] target(s) in 178.61 secs
on stable. Not sure if compilation process is doing equivalent amount of work between stable and nightly here (I’m sure the majority of the improvement is from codegen-units), but at least the compilation does not take forever on my branch now! Will cleanup and submit a PR tomorrow.
For the record, https://reviews.llvm.org/D41296 got merged recently and should limit the memory consumption on debug build too.
@nagisa can you confirm this is fixed now that https://github.com/rust-lang/rust/pull/47220 is merged?
Yeah, I can compile current shadowsocks master just fine.
Thank you all !!!
When I tried to use nightly rust to compile my project shadowsocks-rust, it will hang forever.
Stable version works fine.
Reproduce steps:
It will hang forever, but still consumes CPU.
ps
shows arustc
process:Meta
cargo 0.24.0-nightly (6529d418d 2017-11-29)