Closed aabacchus closed 1 year ago
I couldn't reproduce the issue on my machine, so I need your input files. Can you run the last link command with --repro
(or -Wl,--repro
)? With that option, mold collects all input object files and put them into a tar file. Please upload the generated tar file here so that I can download. Thanks.
Attached is the tarball for the program which segfaults for me. Would you rather have the tarball from linking musl itself? gcc_bad.repro.tar.gz
Statically linked executables don't have this problem.
I build your program with the given tarball, and the resulting executable worked without crashing in my Alpine/musl Docker container. It is likely that the executable itself isn't actually broken.
So you wrote that you build musl yourself. Are you sure your musl is fine?
in my Alpine/musl
If you provided a different libc.so, then yes it would have worked. Here is the tarball of the link step for musl: libc.so.repro.tar.gz
Yes, my musl is fine when linked with other linkers.
It seems your reproducer fails really only when it was loaded by your musl libc.so. I built musl 1.2.4 myself and tried to run your program under my musl (i.e. run the program as /path/to/musl/builddir/libc.so gcc_bad
) and it didn't crash.
The fact that your program didn't crash with other linkers doesn't immediately mean that your musl is fine; it might happen to work for some program (think C's undefined behavior).
How did you build your musl? What is your distro? How can I reproduce your binaries from scratch?
I also want to make sure you didn't apply your local patch to your musl.
To clarify, were you able to use my libc.so.repro.tar.gz
to link a libc.so, which did not crash? That's bizarre. Maybe the compiler used for musl is also important.
It's not just this one off, its a large number of programs which crash or have bugs.
I have not patched musl, it is built normally (./configure; make
in a fresh tarball reproduces the bug). My distribution is KISS, and we do patch mold to build only for amd64, but removing the patch I can still reproduce this. If you'd like some brief instructions to set up a KISS chroot let me know.
I could reproduce the issue with the musl built from your object files, but that's not really debuggable because it's just .o files. It's not that different from libc.so
in libc.so.repro.tar.gz
from the debugging point of view.
If KISS Linux provides an official docker image, I can fire it up and try it myself.
We don't have an official docker image but I've created one. I think it should work if you run
docker run -it aabacchus/kiss sh
(the image is here). I'm not particularly familiar with Docker but I have tested it and can still reproduce the issue.
When you are in the image, you will have to do the following:
First, build mold and it's dependencies. When it is finished it will prompt you to press Enter to install the packages.
$ kiss b mold
Switch mold to provide /usr/bin/ld
$ kiss a mold /usr/bin/ld
Rebuild musl, now with mold as the linker
$ kiss b musl
Trigger the bug
$ cat >test.c <<EOF
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
int main(void) {
puts(program_invocation_short_name);
return 0;
}
EOF
$ cc test.c
$ ./a.out
Segmentation fault (core dumped)
Thanks for the info. How can I build musl with debug info?
Sure. You need to go into the repository for musl and edit its build
script:
cd ~/repos/repo/core/musl/
vi build
Uncomment the :>nostrip
line (which tells kiss not to strip the libraries) and uncomment the --enable-debug
flag to configure
. You should also delete the comment line above --enable-debug
so that the flag is correctly passed to configure
.
If you want to be able to step through the source while debugging, you'll need to add something like this to the top of the build file:
export CFLAGS="$CFLAGS -fdebug-prefix-map=$PWD=/usr/src/musl-1.2.4"
and then put the musl source in /usr/src/musl-1.2.4
:
mkdir -p /usr/src
cd /usr/src
kiss d musl
tar xzf ~/.cache/kiss/sources/musl/musl-1.2.4.tar.gz
Finally you can kiss b musl
.
I recently rebuilt musl and used mold to link it, and subsequently experienced segfaults and bugs in a lot of random programs.
Mimalloc pointers (see https://github.com/microsoft/mimalloc/issues/360#issuecomment-1797331206 and https://bugs.gentoo.org/917089) are somehow pointing to the wrong heap space after linking musl
with mold
, causing segfaults when compiling with Clang.
I can reproduce it with a Gentoo stage3 tarball: https://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-musl-llvm/
CMake Error at /usr/share/cmake/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"/usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC
Run Build Command(s):/usr/bin/ninja -v cmTC_391b2 && [1/2] /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -MD -MT CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -MF CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o.d -o CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -c /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC/testCCompiler.c
[2/2] : && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2 && :
FAILED: cmTC_391b2
: && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2 && :
mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7f1fbad089b0
clang-16: error: unable to execute command: Segmentation fault (core dumped)
clang-16: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed.
I built mold in the gentoo:stage3-musl
docker container, replaced /usr/bin/ld
with mold, built musl with emerge musl
and built clang with emerge clang
. All of it worked fine. I didn't observe any failures. How exactly can I reproduce the issue?
@rui314 Should be reproducible in a stage3-musl-llvm chroot after recompiling llvm with binutils-plugin
and recompiling musl with clang and ld.mold
Download musl+llvm stage3 tarball https://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-musl-llvm/ (unpack to /mnt/gentoo, arch-chroot /mnt/gentoo)
emerge --sync
echo "sys-devel/llvm binutils-plugin" > /etc/portage/package.use/custom
emerge -1 =sys-devel/llvm-16.0.6 --exclude=llvm:17 && emerge sys-libs/mold
replace /etc/portage/make.conf
with
COMMON_FLAGS="-O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto"
CC="clang"
CXX="clang++"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS} -stdlib=libc++"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
LDFLAGS="${COMMON_FLAGS} ${LDLIBS} -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind"
CHOST="x86_64-gentoo-linux-musl"
ACCEPT_KEYWORDS="amd64 ~amd64"
LD="ld.mold"
LC_MESSAGES=C
EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS}"
MAKEOPTS="-j4"
emerge -1 =sys-libs/musl-1.2.3* sys-libs/libcxx --exclude=sys-devel/llvm
@LinuxUserGD isn't it mold segfaulting in your case, not a program linked to musl built with mold?
@LinuxUserGD isn't it mold segfaulting in your case, not a program linked to musl built with mold?
Yes, mold segfaults with -flto
when musl is compiled with mold.
After rebuilding musl with lld, linking with mold completes without the mimalloc error.
Starting program: /usr/bin/ld.mold -pie --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib/ld-musl-x86_64.so.1 -o a.out /lib/Scrt1.o /lib/crti.o /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtbegin-x86_64.o -L/lib -L/usr/lib -plugin /usr/lib/llvm/16/bin/../lib/LLVMgold.so -plugin-opt=mcpu=skylake -plugin-opt=O2 -z relro -z now -O3 --as-needed --strip-debug --undefined-version --icf=safe --threads=4 --compress-debug-sections=none /tmp/check_cxx11-b34c02.o -lc++ -lm /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed -lc /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtend-x86_64.o /lib/crtn.o
[Detaching after fork from child process 232385]
mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7ffff7e36c50
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fd7de7 in setjmp () from /lib/ld-musl-x86_64.so.1
@rui314 I made a docker image with the above commands run, so that it contains the buggy musl. Just
docker run -it aabacchus/test sh
cc test.c
./a.out
to reproduce.
Thank you, everyone. I successfully reproduced the issue following your instructions. It's a challenging issue to debug, but it appears to be related to a subtle bug in weak symbol handling. I will prepare a fix.
This was a bad bug, thank you again for reporting. I believe the above commit fixed the issue. Can you try again with the git head?
da3f5dd
It seems to be fixed, thank you!
The mimalloc segfault is fixed by da3f5dd as well, thanks!
mold version: 2.0.0 musl version: 1.2.4
I recently rebuilt musl and used mold to link it, and subsequently experienced segfaults and bugs in a lot of random programs. After some digging, I found that the problems were all from globals from musl (
program_invocation_short_name
andoptind
in particular). Using a different linker to link musl fixed the problems.Interestingly, programs built with clang didn't have these problems. Consider this C program:
This program, built with GCC against musl linked with mold, segfaults when
puts
tries to dereference a NULL pointer.The difference between clang and GCC is how the global is accessed. GCC does this:
but clang does this:
I have confirmed that the use of
@GOTPCREL
fixes the GCC program.The first version always gets NULL, the second gets the correct value initialised by musl. Similarly with
optind
, in GCC programs,optind
is always 1 even after callinggetopt
, but clang programs can read the updated value. Now, this is quickly approaching the limits of my understanding. Please let me know if I can help with more testing.This happened to me once before a few months ago, but since then I had forgotten how I fixed it.