llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

Clang or lld generates invalid short relocation for Google Chrome with debuginfo #43287

Closed llvmbot closed 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 43942
Resolution INVALID
Resolved on Nov 14, 2019 11:18
Version 9.0
OS FreeBSD
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@jmorse,@zygoloid,@rnk

Extended Description

Possibly related to bug 15356 or bug 21423.

When linking a debug build of recent Chrome (78.x), with recent Clang+LLD (9.0.0), ld.lld fails due to 32-bit relocations on >4GB offsets:

ld.lld: error: /usr/lib/crtn.o:(.debug_aranges+0x6): relocation R_X86_64_32 out of range: 4357891405 is not in [0, 4294967295]; consider recompiling with -fdebug-types-section to reduce size of debug sections

I'm not sure what I'm supposed to do about this as an end-user. Chrome is just a gigantic program:

$ c++ -Wl,--version-script=../../build/linux/chrome.map -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -fuse-ld=lld -Wl,--color-diagnostics -m64 -rdynamic -pie -Wl,--disable-new-dtags -L/usr/local/lib -L/usr/local/lib/nss -fstack-protector-strong -L/usr/local/lib -o "./chrome" -Wl,--start-group @"./chrome.rsp" -Wl,--end-group ... (-lfoo -lbar from here)

$ cat chrome.rsp | tr ' ' '\0' | xargs -0 du -csh ... 3.7G

Should Clang (or lld?) emit 64-bit relocations automatically? Do I need to ask the Chrome folks to use some large data mode flag? They already use -fPIC rather than -fpic.

   -fPIC
       If supported for the target machine, emit position-independent
       code, suitable for dynamic linking and avoiding any limit on the
       size of the global offset table.

(From the gcc manual page.) Also gcc:

   -mcmodel=small
       Generate code for the small code model: the program and its symbols
       must be linked in the lower 2 GB of the address space.  Pointers
       are 64 bits.  Programs can be statically or dynamically linked.
       This is the default code model.

   -mcmodel=medium
       Generate code for the medium model: the program is linked in the
       lower 2 GB of the address space.  Small symbols are also placed
       there.  Symbols with sizes larger than -mlarge-data-threshold are
       put into large data or BSS sections and can be located above 2GB.
       Programs can be statically or dynamically linked.

   -mcmodel=large
       Generate code for the large model.  This model makes no assumptions
       about addresses and sizes of sections.

Here's Clang's full documentation on -mcmodel:

https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-mcmodel

(Yeah, that's not helpful.)

Maybe -mcmodel is passed through to cc1 as -mcode-model= here: https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Clang.cpp#L4320

The scenario seems pretty similar to this test case: https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/x86-64-reloc-debug-overflow.s

llvmbot commented 4 years ago

Chrome's gn build files should already pass various flags to try to mitigate this problem.

It doesn't seem to be the case.

The GN build parameters that failed were something like the following:

====================================8<==================================== GN_BOOTSTRAP_FLAGS= --no-clean --no-rebuild --skip-generate-buildfiles GN_BOOTSTRAP_FLAGS+=--debug

./tools/gn/bootstrap/bootstrap.py ${GN_BOOTSTRAP_FLAGS}

GN_ARGS+= clang_use_chrome_plugins=false \ enable_hangout_services_extension=true \ enable_nacl=false \ enable_one_click_signin=true \ enable_remoting=false \ fieldtrial_testing_like_official_build=true \ is_clang=true \ jumbo_file_merge_limit=8 \ toolkit_views=true \ treat_warnings_as_errors=false \ use_allocator="none" \ use_allocator_shim=false \ use_aura=true \ use_bundled_fontconfig=false \ use_custom_libcxx=false \ use_gnome_keyring=false \ use_jumbo_build=true \ use_lld=true \ use_sysroot=false \ use_system_freetype=true \ use_system_harfbuzz=true \ use_system_libjpeg=true \ extra_cxxflags="${CXXFLAGS}" \ extra_ldflags="${LDFLAGS}" GN_ARGS+=use_alsa=true GN_ARGS+=ffmpeg_branding="Chrome" GN_ARGS+=proprietary_codecs=true GN_ARGS+=enable_hevc_demuxing=true GN_ARGS+=use_cups=true

BUILDTYPE=Debug GN_ARGS+=is_debug=true GN_ARGS+=is_component_build=false GN_ARGS+=symbol_level=1

gen --args='${GN_ARGS}' out/${BUILDTYPE} ====================================8<====================================

Once I manually added '-Og -gsplit-dwarf -gz' to extra_cxxflags, it was able to compile successfully.

rnk commented 4 years ago

I just wanted to confirm that I think everything is working (or broken...) as expected. Chrome is a large C++ project. DWARF for C++ tends to be very large and contain a lot of duplicate data. Eventually you wind up with 4GB ELFs, which is where the default small x86 code model tends to break down. Switching to the medium code model would help, but it makes more sense to use various flags to reduce debug info size. Chrome's gn build files should already pass various flags to try to mitigate this problem.

llvmbot commented 4 years ago

Some combination of -Og (default Chrome Debug build is -O0, which will always generate ridiculous code), -gsplit-dwarf, and -gz eliminated the problem for me. Sorry for the noise.

llvmbot commented 4 years ago

Mostly sounds like something to take up with Chrome folks

Sure, that is reasonable.

  • though I assume there's something esoteric/unique about your setup (& that not all Chrome developers are unable to link chrome with debug info under lld) that might be addressed?

Nothing super exotic. libc isn't glibc and it's not Linux, so that's always a less supported Chrome platform, but otherwise fairly vanilla.

Probably one thing to do would be to use -gsplit-dwarf which'll keep the binary size down (it does mean the debug info isn't bundled inside the binary itself - so you can't delete your build directory (or copy the file to some other computer) & still be able to debug)

I tried some combination of dwarf flags earlier, including -gsplit-dwarf, and got a tripped assertion in LLVM. I could try -gsplit-dwarf alone and see what happens.

dwblaikie commented 4 years ago

Mostly sounds like something to take up with Chrome folks - though I assume there's something esoteric/unique about your setup (& that not all Chrome developers are unable to link chrome with debug info under lld) that might be addressed?

Probably one thing to do would be to use -gsplit-dwarf which'll keep the binary size down (it does mean the debug info isn't bundled inside the binary itself - so you can't delete your build directory (or copy the file to some other computer) & still be able to debug)