rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.5k stars 12.74k forks source link

Build hang on Linux #76980

Open danieldeankon opened 4 years ago

danieldeankon commented 4 years ago

Problem I am seeing the build process hang at the last few steps when attempting to build this particular repository on Linux using either the nightly or the beta toolchain. I've seen this happen for the repository both when building locally (Linux Mint) and in Github's CI (Ubuntu). This does not happen with the stable toolchain, and I have not seen it happen with any toolchain on Windows.

I expect to see the build process finish in approximately 20 minutes like it does with the stable toolchain, but I observe a hanging build. If there is a problem with the build process, I expect to see an indication of what it is, especially if I pass a verbose flag, but I observe no further output once it gets into the hanging state---the only output I see is from steps that have already finished, which provides no insight into why it hangs.

Steps

Switch to nightly toolchain:

$ rustup default nightly
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.48.0-nightly (bbc677480 2020-09-18)

Fetch the following repository:

$ git clone https://github.com/danieldeankon/hypermine.git
$ cd hypermine/
$ git lfs pull

Attempt to build the binaries:

$ cargo build --release -v

The last thing I see before it gets into a hanging state:

   Compiling client v0.1.0 (/home/daniel/git/testinghypermine/hypermine/client)
     Running `rustc --crate-name client --edition=2018 client/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="use-repo-assets"' -C metadata=c38d85b37e3e2e40 -C extra-filename=-c38d85b37e3e2e40 --out-dir /home/daniel/git/testinghypermine/hypermine/target/release/deps -L dependency=/home/daniel/git/testinghypermine/hypermine/target/release/deps --extern anyhow=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libanyhow-43ee26a5a4ae988b.rmeta --extern ash=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libash-1d91e46c2845798f.rmeta --extern ash_window=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libash_window-e581873b9287fab4.rmeta --extern common=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libcommon-bdb19128efefc960.rmeta --extern directories=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libdirectories-4a64043fdaf679b9.rmeta --extern downcast_rs=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libdowncast_rs-be493f8657cf1f42.rmeta --extern futures_util=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libfutures_util-79782c1686738829.rmeta --extern fxhash=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libfxhash-aa9a2aa3814ab312.rmeta --extern gltf=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libgltf-ae78e76833b667c0.rmeta --extern hdrhistogram=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libhdrhistogram-1c5607d77dc61a11.rmeta --extern hecs=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libhecs-7ceed138b109be7b.rmeta --extern lahar=/home/daniel/git/testinghypermine/hypermine/target/release/deps/liblahar-c3a4855d049ab353.rmeta --extern memoffset=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmemoffset-33ada65033678cfe.rmeta --extern metrics=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmetrics-79a79d3342c8701e.rmeta --extern metrics_core=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmetrics_core-7e657b2c60f0cf49.rmeta --extern na=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libnalgebra-615d0479649eed30.rmeta --extern png=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libpng-089627b494fcd05d.rmeta --extern quinn=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libquinn-a3a9da2f9e31ef73.rmeta --extern rcgen=/home/daniel/git/testinghypermine/hypermine/target/release/deps/librcgen-be435a028b2dfce2.rmeta --extern rustls=/home/daniel/git/testinghypermine/hypermine/target/release/deps/librustls-825cbe6b34245b7a.rmeta --extern serde=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libserde-fbaa982a65688d4a.rmeta --extern server=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libserver-8fd1f97c8fb84524.rmeta --extern tokio=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtokio-aa180fba706594fa.rmeta --extern toml=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtoml-f7f6190f25a07c8e.rmeta --extern tracing=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtracing-e0bf1ed6aae74c7d.rmeta --extern vk_shader_macros=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libvk_shader_macros-d2fb7777a36a4f79.rmeta --extern webpki=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwebpki-66dacb92629212d5.rmeta --extern whoami=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwhoami-05f81a39d2c15f6f.rmeta --extern winit=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwinit-0172bcc7807380da.rmeta -L native=/home/daniel/git/testinghypermine/hypermine/target/release/build/ring-86d47ac3f01cd916/out`
    Building [=====================================================> ] 255/257: client

Notes

I was able to reproduce this behaviour in Github CI here: https://github.com/danieldeankon/hypermine/pull/2/checks?check_run_id=1138809646

CI / test (ubuntu-latest, stable) finished in 17m 39s. CI / test (ubuntu-latest, beta) has been going for two hours and counting at time of writing, and similarly for CI / test (ubuntu-latest, nightly).

Output of cargo version: cargo 1.48.0-nightly (8777a6b1e 2020-09-15)

My operating system is Linux Mint 19.2 Cinnamon.

ehuss commented 4 years ago

Transferred the issue to rust-lang/rust. This is a rustc issue, and I don't think there is a lot that Cargo can do about it. rustc is doing work, it just never finishes.

I have bisected this to #73526. cc @cuviper, are there any known issues with the LLVM 11 upgrade causing rustc to go into any kind of infinite loops? When I attached with gdb, it looked like it was deep inside a proc macro (vk-shader-macros).

cuviper commented 4 years ago

I'm not aware of any LLVM 11 issues like that.

LeSeulArtichaut commented 4 years ago

@cuviper Did you remove I-hang and T-compiler labels intentionally or was it conflicting with the fact that I just added them?

cuviper commented 4 years ago

@LeSeulArtichaut I only meant to change the regression label. I used the Android client, so I guess it batched that update, with clobbering. 🤕

LeSeulArtichaut commented 4 years ago

Adding them back now, and I’ll mark this conversation as resolved to limit the noise

LeSeulArtichaut commented 4 years ago

@rustbot ping llvm

rustbot commented 4 years ago

Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

apiraino commented 4 years ago

Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

LeSeulArtichaut commented 4 years ago

I couldn't reproduce this issue with my MacOS laptop (x86_64-apple-darwin). I tried compiling with stable (1.46.0), nightly-2020-09-18 and latest nightly (nightly-2020-09-23), resulting in a successful compilation in less than 15 minutes. Unless I missed something, I guess this probably means this is a Linux-only issue? @ehuss Did you also use a Linux machine for your previous bisection?

It would be great if other people could take a look and eventually find an MCVE which would be very helpful to resolve this bug. @rustbot ping cleanup

rustbot commented 4 years ago

Hey Cleanup Crew ICE-breakers! This bug has been identified as a good "Cleanup ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3

cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @JamesPatrickGill @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @smmalis37 @spastorino @Stupremee @turboladen @woshilapin @yerke

Stupremee commented 4 years ago

I can reproduce it on x86_64-unknown-linux-gnu with rustc 1.48.0-nightly (fb1dc34a8 2020-09-21).

Stupremee commented 4 years ago

I think I found a mvce. (The build is currently running for about ~10 minutes).

Cargo.toml

[dependencies]
vk-shader-macros = "0.2.6"

shader.vert

#version 450
void main()  {}

main.rs

const _VERT: &[u32] = vk_shader_macros::include_glsl!("shader.vert");

fn main() {}
cuviper commented 4 years ago

I couldn't reproduce this issue with my MacOS laptop [...] and latest nightly (nightly-2020-09-23)

I can reproduce it on x86_64-unknown-linux-gnu with rustc 1.48.0-nightly (fb1dc34a8 2020-09-21).

2020-09-21 to 2020-09-23 includes the LLVM 11-rc3 update. @Stupremee can you try with the latest nightly?

Stupremee commented 4 years ago

Same result

spastorino commented 4 years ago

This was discussed in today's compiler weekly meeting.

ehuss commented 4 years ago

I can confirm, this only happens on Linux. Windows and macOS do not seem to be affected. It doesn't seem to matter if it is --release or not. Tested with rustc 1.48.0-nightly (8b4085359 2020-09-23).

Thanks for making the smaller repro @Stupremee! I imagine it will be difficult to narrow it down more. shaderc-sys is a huge dependency.

ehuss commented 4 years ago

Attaching with a debugger, the function to_string is returning an empty string, causing the code here to go into an infinite loop because it is unable to generate a unique string.

My C++ is fairly rusty, and I don't have any guesses what could cause that apparently straightforward code to misbehave.

Stupremee commented 4 years ago

Yep, thats what I found out as well. But it's still super weird that it works on stable, which I haven't figured out right now.

pnkfelix commented 4 years ago

My current hypothesis is that LLVM 11 is performing some incorrect optimization of the SPIRV-Tools library, which would explain why it works on stable.

At least, I cannot find any reason that the code linked by @ehuss above would be allowed to have to_string return an empty string.

(Of course I still need to confirm this hypothesis.)

ehuss commented 4 years ago

Another report of shaderc-sys corruption when used as a proc-macro: https://github.com/rust-lang/rust/issues/77532#issuecomment-703302256. Similarly, a seemingly simple ostringstream construction is returning a corrupt string.

cuviper commented 4 years ago

I suspect this may be a C++ symbol clash with rustc's static libstdc++, since proc-macro libraries are loaded in the same process.

jonas-schievink commented 4 years ago

Maybe setting RTLD_DEEPBIND during dlopen might fix that, rust-analyzer had problems with proc macros until it set that flag

jonas-schievink commented 4 years ago

Yeah that fixes it

cuviper commented 4 years ago

Long term, we might be better off loading proc macros in a helper process. I think the token stream is already marshalled enough that it could be sent inter-process.

jonas-schievink commented 4 years ago

Looks like that wasn't the fix, I was using the wrong LLVM

cuviper commented 4 years ago

LD_DEBUG=bindings may help confirm whether the proc-macro is binding to symbols from our static libstdc++, and which symbols exactly. We can also compare that with a pre-LLVM11 build to see if this issue already existed, perhaps just with less problematic symbols. See man ld.so for more options -- might want to use a LD_DEBUG_OUTPUT file too.

Mark-Simulacrum commented 4 years ago

I have confirmed this regressed in 7ce71c362be9a89e7897ac066aba6e3e6f747800, at least, which I think we had not done before. That matches our prior assumption of being triggered by an LLVM upgrade.

Currently experimenting with the debug options @cuviper suggested.

Mark-Simulacrum commented 4 years ago

https://github.com/Mark-Simulacrum/rust-issue-76980 contains a minimal reproduction with just a cc crate dependency to compile the C++ file. On stable, the proc macro sees 2, on beta it sees 0. I'm still investigating the cause though.

cuviper commented 4 years ago

I tried LD_DEBUG myself on your minimal reproducer, and there are a lot of bindings from /lib64/libstdc++.so.6 to .../lib/libLLVM-1x-rust-1.47.0-nightly.so, which seems bad. I don't see any new bindings that way -- this was also happening before LLVM 11 -- but there's a sequence of internal bindings that seem to show that we're on a different track:

With e482c86b9 (just before LLVM 11 landed): debug.e482c86b9.477027.gz

binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo9_M_insertImEERSoT_' [GLIBCXX_3.4.9]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryC1ERSo' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNKSt5ctypeIcE13_M_widen_initEv' [GLIBCXX_3.4.11]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `memcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE13_M_insert_intImEES3_S3_RSt8ios_basecT_' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEE8_M_pbumpEPcS5_l' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `memcpy' [GLIBC_2.14]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryD1Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_baseD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base17_M_call_callbacksENS_5eventE' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base20_M_dispose_callbacksEv' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt3_V214error_categoryD2Ev' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt14error_categoryD2Ev' [GLIBCXX_3.4.15]

With 7ce71c362 (with LLVM 11): debug.7ce71c362.477169.gz

binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo9_M_insertImEERSoT_' [GLIBCXX_3.4.9]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryC1ERSo' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt16__throw_bad_castv' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_allocate_exception' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_throw' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_get_globals' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol `__tls_get_addr' [GLIBC_2.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_init_primary_exception' [CXXABI_1.3.11]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt14get_unexpectedv' [GLIBCXX_3.4.20]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt13get_terminatev' [GLIBCXX_3.4.20]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_RaiseException' [GCC_3.0]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_Find_FDE' [GCC_3.0]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libc.so.6 [0]: normal symbol `dl_iterate_phdr' [GLIBC_2.2.5]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libc.so.6 [0]: normal symbol `strlen' [GLIBC_2.2.5]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libpthread.so.0 [0]: normal symbol `pthread_once'
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetLanguageSpecificData' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetRegionStart' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetIPInfo' [GCC_4.2.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNK10__cxxabiv117__class_type_info11__do_upcastEPKS0_PKvRNS0_15__upcast_resultE' [CXXABI_1.3]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetCFA' [GCC_3.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_SetGR' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_SetIP' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_begin_catch' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt9basic_iosIcSt11char_traitsIcEE11_M_setstateESt12_Ios_Iostate' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_end_catch' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_get_globals_fast' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_DeleteException' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt9exceptionD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_free_exception' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /home/jistone/.rustup/toolchains/COMMIT/bin/rustc [0]: normal symbol `free' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryD1Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4_' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_baseD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base17_M_call_callbacksENS_5eventE' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base20_M_dispose_callbacksEv' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt3_V214error_categoryD2Ev' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt14error_categoryD2Ev' [GLIBCXX_3.4.15]

I don't know that we should read anything into that difference -- seems like it could be happenstance. In any case, I think we're on very shaky ground loading dynamic C++ proc macros into the rustc process with our static libstdc++.


This also looks a bit related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42679

Keruspe commented 4 years ago

Thanks a lot for all these details in the debugging process, it's really enjoyable to read.

My small contribution, which confirms that things can be right if the symbols are actually the same instead of clashing: With rust 1.47.0 (preview release) compiled with the system llvm (11 rc2), itself built with the system libc++ (11 rc2) which is also used as the default system c++ stdlib, the proc macro ends up using the same libc++ as rustc itself, and the example project outputs 2 as expected.

cuviper commented 4 years ago

Jonathan Wakely pointed me to one of his Stack Overflow answers: https://stackoverflow.com/questions/46746878/is-it-safe-to-link-c17-c14-and-c11-objects/49119902#49119902

Where you have problems is if you link together objects compiled with different versions of GCC and you have used unstable features from a new C++ standard before GCC's support for that standard is complete.

But we're using GCC 5.5 and building for C++14, which he tells me should be fine in that regard.

There's another caveat about std::string vs. std::__cxx11::string, but our build of LLVM does appear to be using the new ABI there, and my Fedora 32 system is also using the new ABI when building the reproducer

Mark-Simulacrum commented 4 years ago

I note that the answer (at least as I interpret it) seems to be about linking together a bunch of code (compiled against and potentially inlining header definitions etc), but still linking to a single copy off libstdc++ -- maybe I'm misinterpreting though?

cuviper commented 4 years ago

Yeah, that SO situation is not a perfect match.

cuviper commented 4 years ago

When I step through the reproducer's ostream<< in libstdc++-10.2.1-1.fc32.x86_64, here's where it goes wrong:

src/foo.cpp

  1 #include <string>
  2 #include <sstream>
  3  
  4 extern "C" int custom_to_string(unsigned int id) {
  5         std::stringstream os;
  6         os << id;
  7         return os.str().length();
  8 }

libstdc++-v3/include/bits/ostream.tcc

 60            template<typename _CharT, typename _Traits>
 61              template<typename _ValueT>
 62                basic_ostream<_CharT, _Traits>&
 63                basic_ostream<_CharT, _Traits>::
 64                _M_insert(_ValueT __v)
 65                {
 66                  sentry __cerb(*this);
 67                  if (__cerb)
 68                    {
 69                      ios_base::iostate __err = ios_base::goodbit;
 70                      __try
 71                        {
>72                          const __num_put_type& __np = __check_facet(this->_M_num_put);
 73                          if (__np.put(*this, *this, this->fill(), __v).failed())
 74                            __err |= ios_base::badbit;
 75                        }
 76                      __catch(__cxxabiv1::__forced_unwind&)
 77                        {
 78                          this->_M_setstate(ios_base::badbit);
 79                          __throw_exception_again;
 80                        }
 81                      __catch(...)
 82                        { this->_M_setstate(ios_base::badbit); }
 83                      if (__err)
 84                        this->setstate(__err);
 85                    }
 86                  return *this;
 87                }

At line 72, this->_M_num_put is NULL. Then _check_facet calls __throw_bad_cast(), which is caught and line 78 sets badbit. Back in custom_to_string, I can indeed see that stream state:

(gdb) p os.rdstate()
$15 = std::_S_badbit

That's why the stringstream is empty.

I still don't know why we get that NULL, but there was a tell in what I posted earlier -- just the facet lines...

Before:

binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]

After:

binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]

I guess that may need stepping through the stream constructor, but that's a mess of locale code...

jethrogb commented 4 years ago

If I change all dynamic symbols in libLLVM-11-rust-1.48.0-nightly.so that are of binding STB_GNU_UNIQUE and I change them to binding GLOBAL, the problem goes away.

fn main() {
    let mut file = std::fs::File::open("libLLVM-11-rust-1.48.0-nightly.so").unwrap();
    let mut elf = elfkit::elf::Elf::from_reader(&mut file).unwrap();
    elf.load_all(&mut file).unwrap();

    let dynsym = elf.sections.iter_mut().find(|sec| sec.name == b".dynsym").unwrap();
    let syms = dynsym.content.as_symbols_mut().unwrap();
    for sym in syms {
        if sym.bind == elfkit::types::SymbolBind::STB_GNU_UNIQUE {
            sym.bind = elfkit::types::SymbolBind::GLOBAL
        }
    }

    let mut f = std::fs::File::create("test.so").unwrap();
    elf.to_writer(&mut f).unwrap();
}
jethrogb commented 4 years ago

I found this by observing (with LD_DEBUG) that symbols with version LLVM_11 are being bound to symbols with version GLIBCXX_3.4. This is not normal behavior. I'm guessing there are some special rules for STB_GNU_UNIQUE.

     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
...
jethrogb commented 4 years ago

Should we be building the LLVM shared object with some flags to only export LLVM symbols, not other unrelated symbols?

cuviper commented 4 years ago

I'm not sure, but we may need libstdc++ interop symbols between librustc_driver.so (with crate rustc_llvm's C++ wrappers) and libLLVM.so.

jethrogb commented 4 years ago

I do see some STL use in rustc_llvm, but that looks to be all inlined/linked statically based on objdump -TC librustc_driver-cc82839033d44d0f.so|grep LLVM_11

jethrogb commented 4 years ago

I tried reading the LLVM CMake files but I can't make much out of it. According to https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-shlib/CMakeLists.txt#L1-L3 by default the shared object should only contain the C functions, but that's clearly not the case for us and I don't see how we're enabling any deviation from the default.

jethrogb commented 4 years ago

Alternatively, we might compile LLVM with -fno-gnu-unique.

cuviper commented 4 years ago

FWIW, I can also reproduce this with my own local rust/llvm build configured with [llvm] static-libstdcpp = true -- so the versions of the static and dynamic libstdc++ are identical.

cuviper commented 4 years ago

Hmm, but it also requires [llvm] link-shared = true. When I set that false, the reproducer behaves properly.

tmandry commented 4 years ago

For this post I consulted with my friendly neighborhood LLVM expert, @petrhosek.

The libstdc++ symbols get exported from libLLVM.so because we don't compile with -fvisibility=hidden. If they didn't get exported the dynamic linker wouldn't override them. However, turning on this flag would likely break the rustllvm bridge, which calls into LLVM's private C++ ABI.

So it looks like our options are one of:

  1. Document that proc macros must not dynamically link the C++ standard library (or more generally, any dynamic libraries other than the proc macro runtime)
  2. Run proc macros in a separate process
  3. Link LLVM statically
  4. Build rustllvm into libLLVM.so somehow (@petrhosek said we could build a libLLVM.a and then link that into a rustllvm shared library, but this would likely require adding support for building a static library to LLVM's build system)
Mark-Simulacrum commented 4 years ago

I've opened up https://github.com/rust-lang/rust/pull/77634 to switch off ThinLTO in our x86_64-unknown-linux-gnu CI, which I believe would mean that we end up with a statically linked LLVM. It's not something I want to merge into master, since it currently would break downloading LLVM from CI (I think), but it is something I would feel comfortable backporting to stable, presuming the performance hit is not too big. (It's also performance every other platform has, which is not entirely unreasonable).

This is essentially option 3 above.

I suspect that in the long run option 4 is the right thing for us -- it seems feasible to do, at least, and we could also replace rustllvm with a commit directly to LLVM since we maintain a fork anyway which expands the C ABI for us (I think this is true)?

cuviper commented 4 years ago

we could also replace rustllvm with a commit directly to LLVM since we maintain a fork anyway which expands the C ABI for us (I think this is true)?

Sort of, but not really. I guess you can say that rustc_llvm (formerly rustllvm) is that expanded C ABI, because it wraps C++ calls in an extern "C" API for Rust FFI. But this stands alone so it can be used with external LLVM too. The only added APIs within our forked LLVM are MCSubtargetInfo::getCPUTable and getFeatureTable, which are still C++, and neither are essential. It would be leaving external LLVM users (distros) behind if we made rust-forked APIs mandatory within LLVM.

Mark-Simulacrum commented 4 years ago

Okay, looking at results from #77634, it looks like that's a very viable fix for stable (and likely 1.48 beta).

Wall times https://perf.rust-lang.org/compare.html?start=9fdaeb393a16951f6fdef087193fef576e36aba6&end=6f967be1d4ff00423ed16df7cb77f4074a9d6b80&stat=wall-time are pretty good -- definitely not bad enough that we can't ship it.

jethrogb commented 4 years ago

While the options @tmandry lists would work, they're not necessary. As I already pointed out, the C++ symbols in libLLVM.so have different symbol versions (LLVM_xxx), so shared objects that were linked against the system libstdc++ (GLIBCXX_xxx) should never link to them. There's just an oddity with some of the symbols where the dynamic loader is trying to make them globally unique in the process despite the symbol versioning that's applied. Additionally, rustc_llvm doesn't actually link to the STL symbols exported by libLLVM.so.

I don't know why this changed between LLVM 10 and LLVM 11, but here you can clearly see the difference:

$ objdump -T stable-x86_64-unknown-linux-gnu/lib/libLLVM-10-rust-1.46.0-stable.so |grep _ZNSs4nposE
000000000077cfd8  w   DO .rodata    0000000000000008  LLVM_10     _ZNSs4nposE
                  ^
                weak symbol
$ objdump -T nightly-x86_64-unknown-linux-gnu/lib/libLLVM-11-rust-1.48.0-nightly.so | grep _ZNSs4nposE
00000000008214b8 u    DO .rodata    0000000000000008  LLVM_11     _ZNSs4nposE
                 ^
                STB_GNU_UNIQUE symbol

So, again, here are the options I propose that are much less “invasive” in the sense that they don't affect linking flows or code generation, just what symbols are exported from the libraries we already build and how:

  1. Avoid generating STB_GNU_UNIQUE-bound symbols, in one of three ways: a. Find out what changed between LLVM 10 and LLVM 11 to see if there's a flag we can set to prevent this behavior. b. Compile LLVM with the -fno-gnu-unique compiler flag c. Run the tool from https://github.com/rust-lang/rust/issues/76980#issuecomment-704480880 on libLLVM.so after LLVM is built.
  2. Change what symbols are exported from libLLVM.so. We only need the LLVM C/C++ API, nothing else.
    • Presumably there are some CMake flags we can use for this
pietroalbini commented 4 years ago

Started building a pre-release with ThinLTO disabled in https://github.com/rust-lang/rust/pull/77642, so if that works I think we're covered for 1.47.0. We still need to fix it for 1.48.0 onwards, but for them we have the time to think about and test a proper solution (@jethrogb's proposals are interesting).

pietroalbini commented 4 years ago

The pre-release is out! You can download it following these instructions: https://blog.rust-lang.org/inside-rust/2020/10/07/1.47.0-prerelease-2.html