Assertion 'ObjFilename.second' failed when using LLVMgold LTO

benjamasu commented 1 month ago

ld.mold: .../llvm-19.1.1/work/llvm/tools/gold/gold-plugin.cpp:1069: std::vector<std::pair<SmallString<128>, bool>> runLTO(): Assertion `ObjFilename.second' failed

Probably related to #1355. The problem appears when using Clang + LTO + mold, but not always. When in the mentioned issue the problem occurred when building any sources, this one does not always appear.

I'll post a Dockerfile to reproduce the problem a bit later.

benjamasu commented 1 month ago

It was my mistake to place the downloading and unpacking of the hardened_malloc sources archive in bootstrap.sh. If you still have the image from #1355, you can use it with the reproduce.sh from this comment.

### Dockerfile

# name the portage image
FROM gentoo/portage:latest AS portage

# based on stage3 image
FROM gentoo/stage3:llvm

# copy the entire portage volume in
COPY --from=portage /var/db/repos/gentoo /var/db/repos/gentoo

ADD bootstrap.sh / 
ADD reproduce.sh /

RUN chmod +x /reproduce.sh

RUN chmod +x /bootstrap.sh
RUN /bootstrap.sh

CMD [ "/bin/bash" ]

### bootstrap.sh

#!/bin/bash

echo '
CC=clang
CXX=clang++
LD=ld.lld
AR=llvm-ar
NM=llvm-nm
STRIP=llvm-strip
OBJCOPY=llvm-objcopy
OBJDUMP=llvm-objdump
READELF=llvm-readelf
RANLIB=llvm-ranlib

ACCEPT_KEYWORDS="~amd64"

FEATURES="${FEATURES} splitdebug parallel-fetch parallel-install -ipc-sandbox -network-sandbox -pid-sandbox"

USE="${USE} debug binutils-plugin verify-sig"
' >> /etc/portage/make.conf

echo "MAKEOPTS=\"-j$(nproc)\"" >> /etc/portage/make.conf

# echo "EMERGE_DEFAULT_OPTS=\"-j$(nproc) --load-average=$(($(nproc) + 1)).0\"" >> /etc/portage/make.conf

# Select fastest available mirrors
emerge -v mirrorselect
mirrorselect -S -s 3 -b 10 -o >> /etc/portage/make.conf

# Rebuild LLVM with USE=binutils-plugin to make LLVMgold available
emerge -v llvm clang lld llvmgold

## GCC and Glibc can also be rebuilt to get debug symbols.
## GCC 13 is only needed for the reason
## that the command I suggested in `reproduce.sh`
## links against GCC 13 libstdc++,
## but this can be easily changed.
# emerge -v glibc "sys-devel/gcc:13"

# emerge -v valgrind

# Build mold from git HEAD
emerge -v dev-vcs/git
# If specific commit needed,
# then just set the values of the corresponding environment variables
# *   EGIT_OVERRIDE_REPO_RUI314_MOLD
# *   EGIT_OVERRIDE_BRANCH_RUI314_MOLD
# *   EGIT_OVERRIDE_COMMIT_RUI314_MOLD
# *   EGIT_OVERRIDE_COMMIT_DATE_RUI314_MOLD
env ACCEPT_KEYWORDS="**" emerge -v mold

## Build mold with latest available release in gentoo repos
# emerge -v mold

### reproduce.sh

#!/bin/bash

# Required to add clang to PATH
source /etc/profile

wget -O git-2.47.0.tar.xz https://mirrors.edge.kernel.org/pub/software/scm/git/git-2.47.0.tar.xz

tar -xaf git-2.47.0.tar.xz

cd git-2.47.0

make CC=clang CXX=clang++ LD=ld.mold CFLAGS="-flto=thin" CXXFLAGS="-flto=thin" LDFLAGS="-flto=thin -fuse-ld=mold" V=1 -j$(nproc)

rui314 commented 1 month ago

Here is the command line options given to mold when mold failed due to the assertion failure.

--hash-style=gnu
--eh-frame-hdr
-m elf_x86_64
-pie
-dynamic-linker
/lib64/ld-linux-x86-64.so.2
-o scalar
/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/Scrt1.o
/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crti.o
/usr/lib/gcc/x86_64-pc-linux-gnu/13/crtbeginS.o
-L/usr/lib/gcc/x86_64-pc-linux-gnu/13
-L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64
-L/lib/../lib64
-L/usr/lib/../lib64
-L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/lib
-L/lib
-L/usr/lib
-plugin /llvm-build/bin/../lib/LLVMgold.so
-plugin-opt=mcpu=x86-64
-plugin-opt=thinlto
scalar.o
common-main.o
libgit.a
xdiff/lib.a
reftable/libreftable.a
libgit.a
-lz
-lpthread
-lrt
-lgcc
--as-needed
-lgcc_s
--no-as-needed
-lc
-lgcc
--as-needed
-lgcc_s
--no-as-needed
/usr/lib/gcc/x86_64-pc-linux-gnu/13/crtendS.o
/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crtn.o

As you can see, libgit.a was given twice. This causes an assertion failure to ensure that a file is passed only once to the LLVMgold plugin. I believe that assertion is invalid, because if a user passes the same file twice, the most straight thing for the linker to do is to just pass them through to the LTO backend. It seems that the assertion can simply be removed as follows.

diff --git a/llvm/tools/gold/gold-plugin.cpp b/llvm/tools/gold/gold-plugin.cpp
index 0377791d85b3..d01bab8bd17a 100644
--- a/llvm/tools/gold/gold-plugin.cpp
+++ b/llvm/tools/gold/gold-plugin.cpp
@@ -1066,11 +1066,10 @@ static std::vector<std::pair<SmallString<128>, bool>> runLTO() {
     // the module paths encoded in the index reflect where the backends
     // will locate the full bitcode files for compiling/importing.
     std::string Identifier =
         getThinLTOObjectFileName(F.name, OldSuffix, NewSuffix);
     auto ObjFilename = ObjectToIndexFileState.insert({Identifier, false});
-    assert(ObjFilename.second);
     if (const void *View = getSymbolsAndView(F))
       addModule(*Lto, F, View, ObjFilename.first->first());
     else if (options::thinlto_index_only) {
       ObjFilename.first->second = true;
       writeEmptyDistributedBuildOutputs(Identifier, OldPrefix, NewPrefix,

Can you report it to LLVM?

benjamasu commented 1 month ago

It appears this issue was submitted some time ago. And it has gone unheeded, at least for now. https://github.com/llvm/llvm-project/issues/104243

rui314 commented 1 month ago

Thanks for checking. I left my comment to that bug.

rui314 / mold

Assertion 'ObjFilename.second' failed when using LLVMgold LTO #1356