rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.63k stars 12.49k forks source link

Missing object files when linking with LTO #66285

Open kvark opened 4 years ago

kvark commented 4 years ago

Situation

I'm trying to add a dependency to SPIRV-Cross for Gecko, and I'm seeing linking errors (with undefined symbols) only in the "OS X Cross Compiled asan" job (cross-compiled macosx with address sanitizer and fuzziness. Non-fuzzy (non-asan) cross-compiled builds are fine.

This has been blocking a very large and important change to Gecko for about a week. I would appreciate any hints on how to approach this!

Problem

The objects of SPIRV-Cross are supposed to reach the linker by the following path:

  1. The C++ project is built into libspirv-cross-rust-wrapper.a by "cc-rs" that is invoked from the build.rs of spirv_cross crate
  2. That crates produces a regular Rust library that is statically linked with this .a file
  3. It is a dependency of other crates, eventually making its way to the gkrust crate, which is the root.
  4. libgkrust.a is a static C library build by Gecko
  5. Gecko static libraries are linked into XUL dynamic library

I can inspect libspirv-cross-rust-wrapper.a with llvm-objdump and I see everything is in place as expected: objdump-local.txt. I don't know how to properly inspect the rlib or rmeta of the Rust library products. I can, however, inspect libgkrust.a and see that some of the SPIRV-Cross objects didn't make it: libgkrust-objdump.txt (e.g. it has spirv_cfg.o but no spirv_cross.o). The selection appears to be arbitrary but consistent between compilations. From that, I conclude that the metadata about statically linked objects is not propagated through steps 2-3, which is what Cargo/Rustc are responsible for (!).

I'm able to reproduce the linking errors locally on linux, by downloading the artifacts and enabling both ASAN and fuzziness options in mozconfig-cross.txt. I confirmed that removing the "--enable-fuzzing" option makes it link fine.

I understand that this is a very big project, and it's not trivial to replicate the setup. However, so far, attempts to reproduce this on a reduced test case were not fruitful.

Further observations

It appears that Gecko has only handful of dependencies on C/C++ code by the vendored Rust crates, and all of them are either C or Obj-C, using clang for compiling (via "cc-rs" crate). SPIRV-Cross dependency is the only one in C++ using clang++, which makes it rather unique.

I'm also attaching the reduced verbose build log that only rebuilds spirv-cross related pieces: detailed-build.log.

cc @jrmuizel @alexcrichton

jrmuizel commented 4 years ago

cc @michaelwoerister

kvark commented 4 years ago

I tried to narrow down the issue a bit more by consequently building the crates between libspirv-cross-rust-wrapper.a and libgkrust.a with forced "--crate-type staticlib", so that I can inspect their symbols. Here is the chain I tested:

spirv-cross-rust-wrapper -> gfx-backend-metal -> wgpu-native - wgpu-remote -> gkrust-shared -> gkrust

Interestingly, the spirv-cross objects were correct up to the last step, the command for which was other-command.txt (unmodified, since this is already a static lib target).

Technically, I could put all the toochain binaries somewhere together with the rlib dependencies and a Makefile, so that the linking issue is reproduced (hopefully) based on that. Would this case be useful to investigate?

alexcrichton commented 4 years ago

I unfortunately don't know a lot about sanitizers on OSX or sanitizers in general. This may or may not be related to https://github.com/rust-lang/rust/issues/66140, but unfortunately I don't think I'll be much help here.

kvark commented 4 years ago

Removing "-Clto" makes it work :tada: . As a side note, compiling time (for the ".a" module specifically_ is 2 seconds versus something like 5 minutes... Do we have somebody familiar with LTO linking issues?

jrmuizel commented 4 years ago

It looks this is caused by: https://github.com/rust-lang/rust/blob/abf2e00e38ad404d563f03acbcf06b08813fd086/src/librustc_codegen_llvm/back/archive.rs#L144 because the C++ files in the spirv_cross crate began with the spirv_cross prefix. e.g: https://searchfox.org/mozilla-central/source/third_party/rust/spirv_cross/src/vendor/SPIRV-Cross/spirv_cross.cpp

This was confirmed by renaming the crate to spirv-cross-internal and successfully linking.

@alexcrichton it looks like you originally wrote this condition. Can we use a more precise test that won't accidentally discard the needed object files?

alexcrichton commented 4 years ago

Oh dear, that's a nasty bug!

I don't know of an easy way we can tweak that condition off-hand, but I've long thought that we should be listing object files in the metadata of a crate so we don't have to guess but rather we have precise names.