rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.07k stars 461 forks source link

sparc64: a bunch of tests are failing #1305

Open sylvestre opened 1 month ago

sylvestre commented 1 month ago

Not sure you care about this but opening in case you do

https://buildd.debian.org/status/fetch.php?pkg=mold&arch=sparc64&ver=2.32.1%2Bdfsg-2&stamp=1721122298&raw=0

The following tests FAILED:
      1 - sparc64-abs-error (Failed)
     10 - sparc64-bno-symbolic (Failed)
     29 - sparc64-copyrel-protected (Failed)
     51 - sparc64-duplicate-error-archive (Failed)
     69 - sparc64-exclude-libs (Failed)
     96 - sparc64-hello-dynamic (Failed)
    106 - sparc64-ifunc-alias (Failed)
    131 - sparc64-linker-script-relocatable (Failed)
    154 - sparc64-nmagic (Failed)
    182 - sparc64-relocatable-c++ (Failed)
    192 - sparc64-repro (Failed)
    225 - sparc64-symbol-version2 (Failed)
    246 - sparc64-tls-gd (Failed)
    257 - sparc64-tls-small-alignment (Failed)
    292 - sparc64-version-script3 (Failed)
    293 - sparc64-version-script4 (Failed)
    312 - sparc64-weak-undef4 (Failed)
    324 - sparc64-z-origin (Failed)
    328 - sparc64-z-separate-code (Failed)

Example:

19: Test command: /usr/bin/bash "-x" "/<<PKGBUILDDIR>>/test/elf/color-diagnostics.sh"
19: Working Directory: /<<PKGBUILDDIR>>/obj-sparc64-linux-gnu
19: Environment variables: 
19:  MACHINE=sparc64
19:  CPU=
19: Test timeout computed to be: 1500
18: + grep -q 'unknown command line option: -z foo'
1: ++ on_error 23
1: ++ code=1
1: ++ echo 'command failed: 23: grep -q '\''recompile with -fPIC'\'' $t/log'
1: command failed: 23: grep -q 'recompile with -fPIC' $t/log
1: ++ trap - EXIT
1: ++ exit 1
6: + cc -B. -shared -o out/test/elf/sparc64/as-needed-dso2/libfoo.so out/test/elf/sparc64/as-needed-dso2/a.o
7: + cat
7: + cc -o out/test/elf/sparc64/as-needed-weak/libbar.so -shared -fPIC -Wl,-soname,libbar.so -xc -
  4/331 Test   #1: sparc64-abs-error ...........................***Failed    0.48 sec
rui314 commented 1 month ago

Is that a native SPARC host? These tests pass on my x86-64 machine with qemu-sparc64 and the SPARC cross compiler. Unfortunately, all Debian SPARC machines in the GCC compiler farm are down right now, so I have no access to real SPARC machines at the moment. Do you think there's any way to reproduce the issue in any way?

sylvestre commented 1 month ago

@glaubitz is it something you could help with ? :) thanks

glaubitz commented 1 month ago

@glaubitz is it something you could help with ? :) thanks

These seem to be regressions with 2.32.x, 2.31.0 still builds fine:

https://buildd.debian.org/status/logs.php?pkg=mold&arch=sparc64

CC @thesamesam @jrtc27

rui314 commented 1 month ago

Can you try again with the above change?

glaubitz commented 1 month ago

I just checked out mold from git which has the above change already applied but it still fails with:

95% tests passed, 18 tests failed out of 333

Total Test time (real) = 147.01 sec

The following tests FAILED:
         13 - sparc64-bsymbolic-non-weak (Failed)
         32 - sparc64-copyrel (Failed)
         41 - sparc64-demangle-cpp (Failed)
         48 - sparc64-dso-undef (Failed)
         64 - sparc64-empty-file (Failed)
        111 - sparc64-ifunc-export (Failed)
        123 - sparc64-issue646 (Failed)
        156 - sparc64-no-allow-shlib-undefined (Failed)
        189 - sparc64-relocatable-merge-sections (Failed)
        194 - sparc64-require-defined (Failed)
        201 - sparc64-rpath (Failed)
        208 - sparc64-separate-debug-file (Failed)
        246 - sparc64-tls-gd-noplt (Failed)
        249 - sparc64-tls-ie (Failed)
        272 - sparc64-undefined2 (Failed)
        289 - sparc64-version-script2 (Failed)
        308 - sparc64-weak-export-dso (Failed)
        319 - sparc64-z-cet-report (Failed)
Errors while running CTest
matoro commented 1 month ago

I just tested this as well, checked out tag v2.32.1 from git and it does NOT reproduce for me on Gentoo, I get all tests passed. Maybe a Debian-specific problem?

100% tests passed, 0 tests failed out of 331

Total Test time (real) = 129.32 sec

The following tests did not run:
          2 - sparc64-absolute-symbols (Skipped)
         25 - sparc64-compress-debug-sections (Skipped)
         26 - sparc64-compressed-debug-info (Skipped)
         34 - sparc64-dead-debug-sections (Skipped)
         73 - sparc64-execute-only (Skipped)
         81 - sparc64-gdb-index-compress-output (Skipped)
         82 - sparc64-gdb-index-dwarf2 (Skipped)
         83 - sparc64-gdb-index-dwarf3 (Skipped)
         84 - sparc64-gdb-index-dwarf4 (Skipped)
         85 - sparc64-gdb-index-dwarf5 (Skipped)
         86 - sparc64-gdb-index-dwarf64 (Skipped)
         88 - sparc64-gdb-index-split-dwarf (Skipped)
        113 - sparc64-ifunc-static-pie (Skipped)
        142 - sparc64-lto-llvm (Skipped)
        176 - sparc64-range-extension-thunk (Skipped)
        201 - sparc64-run-clang (Skipped)
        217 - sparc64-static-pie (Skipped)
        258 - sparc64-tlsdesc-dlopen (Skipped)
        259 - sparc64-tlsdesc-import (Skipped)
        260 - sparc64-tlsdesc-initial-exec (Skipped)
        261 - sparc64-tlsdesc-local-dynamic (Skipped)
        262 - sparc64-tlsdesc-static (Skipped)
        263 - sparc64-tlsdesc (Skipped)
$ git status
HEAD detached at v2.32.1
nothing to commit, working tree clean
rui314 commented 1 month ago

@sylvestre @glaubitz Can you pack the entire test output directory and share it with me if you can do that? I believe it's going to be a few hundred megabytes, so you might not be able to upload it here, though.

glaubitz commented 1 month ago

@sylvestre @glaubitz Can you pack the entire test output directory and share it with me if you can do that? I believe it's going to be a few hundred megabytes, so you might not be able to upload it here, though.

Sure, here you go: https://people.debian.org/~glaubitz/mold-build-sparc64.tgz

rui314 commented 1 month ago

@glaubitz Thanks! Some executables are missing in the directories, so maybe mold crashes for some tests on your machine? If you run one of the failing test script by hand (you can just execute it because it's just a bash script), does mold crash?

glaubitz commented 1 month ago

I tried running one of the shared-object files which has executable flags set:

(sid_sparc64-dchroot)glaubitz@stadler:~/mold/build/out/test/elf/sparc64/bsymbolic-non-weak$ ./b.so 
Segmentation fault
(sid_sparc64-dchroot)glaubitz@stadler:~/mold/build/out/test/elf/sparc64/bsymbolic-non-weak$
glaubitz commented 1 month ago

I will try to bisect this now.

rui314 commented 1 month ago

I mean mold might have crashed on your machine, not the shared object file.

glaubitz commented 1 month ago

There was no script to run in this folder, just three object files.

rui314 commented 1 month ago

You may want to run bsymbolic-non-weak.sh in test/elf directory in the mold source directory.

glaubitz commented 1 month ago

Odd that it invokes the normal system linker:

(sid_sparc64-dchroot)glaubitz@stadler:~/mold$ ./test/elf/bsymbolic-non-weak.sh
Testing bsymbolic-non-weak ... + cat
+ cc -c -o out/test/elf/sparc64/bsymbolic-non-weak/a.o -fPIC -xc -
+ cc -B. -shared -o out/test/elf/sparc64/bsymbolic-non-weak/b.so out/test/elf/sparc64/bsymbolic-non-weak/a.o -Wl,-Bsymbolic-non-weak
/usr/bin/ld: unrecognized option '-Bsymbolic-non-weak'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
++ on_error 18
++ code=1
++ echo 'command failed: 18: $CC -B. -shared -o $t/b.so $t/a.o -Wl,-Bsymbolic-non-weak'
command failed: 18: $CC -B. -shared -o $t/b.so $t/a.o -Wl,-Bsymbolic-non-weak
++ trap - EXIT
++ exit 1
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$

If I remember correctly, gcc doesn't allow setting an absolute path to an alternative linker like clang, does it?

rui314 commented 1 month ago

You want to run the script with the build directory as the current directory. In other words, ./ld should exist.

But I can see the problem now. It looks like your compiler does not support -Bsymbolic-non-weak. That can explain why this particular test is failing.

How about other tests?

rui314 commented 1 month ago

Ah, sorry the error message about -Bsymbolic-non-weak was printed out by GNU ld, so it is irrelevant.

glaubitz commented 1 month ago

Here are three more tests:

Testing copyrel ... + cat
+ cc -fno-PIC -o out/test/elf/sparc64/copyrel/a.o -c -xc -
+ cat
+ cc -fno-PIC -o out/test/elf/sparc64/copyrel/b.o -c -xc -
+ cat
+ cc -fPIC -o out/test/elf/sparc64/copyrel/c.o -c -xc -
+ cc -B. -shared -o out/test/elf/sparc64/copyrel/c.so out/test/elf/sparc64/copyrel/c.o
+ cc -B. -no-pie -o out/test/elf/sparc64/copyrel/exe out/test/elf/sparc64/copyrel/a.o out/test/elf/sparc64/copyrel/b.o out/test/elf/sparc64/copyrel/c.so
+ out/test/elf/sparc64/copyrel/exe
+ grep -q '42 42 1'
++ on_error 29
++ code=1
++ echo 'command failed: 29: grep -q '\''42 42 1'\'''
command failed: 29: grep -q '42 42 1'
++ trap - EXIT
++ exit 1
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$ ./test/elf/demangle-cpp.sh
Testing demangle-cpp ... + cat
+ cc -c -o out/test/elf/sparc64/demangle-cpp/a.o -xc -
+ cc -B. -o out/test/elf/sparc64/demangle-cpp/exe1 out/test/elf/sparc64/demangle-cpp/a.o
+ grep -Fq 'ns::version()' out/test/elf/sparc64/demangle-cpp/log
+ cat
+ cc -c -o out/test/elf/sparc64/demangle-cpp/b.o -xc -
/tmp/cc7tUFj3.s: Assembler messages:
/tmp/cc7tUFj3.s:18: Warning: setting incorrect section attributes for .comment
+ cc -B. -o out/test/elf/sparc64/demangle-cpp/exe2 out/test/elf/sparc64/demangle-cpp/b.o
+ grep -Fq ns::versionv out/test/elf/sparc64/demangle-cpp/log
++ on_error 19
++ code=1
++ echo 'command failed: 19: grep -Fq '\''ns::versionv'\'' $t/log'
command failed: 19: grep -Fq 'ns::versionv' $t/log
++ trap - EXIT
++ exit 1
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$ ./test/elf/dso-undef.sh
Testing dso-undef ... + cat
+ cc -fPIC -o out/test/elf/sparc64/dso-undef/a.o -c -xc -
+ cc -B. -o out/test/elf/sparc64/dso-undef/b.so -shared out/test/elf/sparc64/dso-undef/a.o
+ cat
+ cc -o out/test/elf/sparc64/dso-undef/c.o -c -xc -
+ rm -f out/test/elf/sparc64/dso-undef/d.a
+ ar rcs out/test/elf/sparc64/dso-undef/d.a out/test/elf/sparc64/dso-undef/c.o
+ cat
+ cc -o out/test/elf/sparc64/dso-undef/e.o -c -xc -
+ cc -B. -o out/test/elf/sparc64/dso-undef/exe out/test/elf/sparc64/dso-undef/b.so out/test/elf/sparc64/dso-undef/d.a out/test/elf/sparc64/dso-undef/e.o
/usr/bin/ld: out/test/elf/sparc64/dso-undef/e.o: in function `main':
<stdin>:(.text+0x18): undefined reference to `bar'
/usr/bin/ld: <stdin>:(.text+0x1c): undefined reference to `bar'
/usr/bin/ld: <stdin>:(.text+0x20): undefined reference to `bar'
collect2: error: ld returned 1 exit status
++ on_error 26
++ code=1
++ echo 'command failed: 26: $CC -B. -o $t/exe $t/b.so $t/d.a $t/e.o'
command failed: 26: $CC -B. -o $t/exe $t/b.so $t/d.a $t/e.o
++ trap - EXIT
++ exit 1
(sid_sparc64-dchroot)glaubitz@stadler:~/mold$
glaubitz commented 1 month ago

Odd, when I run the tests from the build directory individually, they pass:

(sid_sparc64-dchroot)glaubitz@stadler:~/mold/build$ ../test/elf/bsymbolic-non-weak.sh
Testing bsymbolic-non-weak ... + cat
+ cc -c -o out/test/elf/sparc64/bsymbolic-non-weak/a.o -fPIC -xc -
+ cc -B. -shared -o out/test/elf/sparc64/bsymbolic-non-weak/b.so out/test/elf/sparc64/bsymbolic-non-weak/a.o -Wl,-Bsymbolic-non-weak
+ cat
+ cc -c -o out/test/elf/sparc64/bsymbolic-non-weak/c.o -xc -
+ cc -B. -o out/test/elf/sparc64/bsymbolic-non-weak/exe out/test/elf/sparc64/bsymbolic-non-weak/c.o out/test/elf/sparc64/bsymbolic-non-weak/b.so
+ out/test/elf/sparc64/bsymbolic-non-weak/exe
+ grep -q '^3 3 3 3 4 7$'
+ on_exit
+ echo OK
OK
+ exit 0
(sid_sparc64-dchroot)glaubitz@stadler:~/mold/build$

But not with ninja -v test.

glaubitz commented 1 month ago

Bisecting lead me to this commit:

02b439af483484693de1e6a851a2f8b1a95656bb is the first bad commit
commit 02b439af483484693de1e6a851a2f8b1a95656bb (HEAD)
Author: Rui Ueyama <ruiu@cs.stanford.edu>
Date:   Tue May 21 11:51:28 2024 +0900

    Upgrade bundled mimalloc to 2.1.6

    Fixes https://github.com/rui314/mold/issues/1257

Is Debian maybe unbundling mimalloc?

glaubitz commented 1 month ago

OK, building with -DMOLD_USE_MIMALLOC=OFF helps.

rui314 commented 1 month ago

If you have a bandwidth to work on this issue, you may want to bisect mimalloc between 2.1.2 and 2.1.6 to see what change to mimalloc broke it. mimalloc maintainers are open for pull requests, but I'm pretty sure that they are not interested in debugging and fixing a SPARC issue themselves, unfortunately.

glaubitz commented 1 month ago

On Jul 24, 2024, at 9:53 AM, Rui Ueyama @.***> wrote:

 If you have a bandwidth to work on this issue, you may want to bisect mimalloc between 2.1.2 and 2.1.6 to see what change to mimalloc broke it. mimalloc maintainers are open for pull requests, but I'm pretty sure that they are not interested in debugging and fixing a SPARC issue themselves, unfortunately.

Does mold actually use the system mimalloc library by default over its embedded one?

I’m asking because Debian’s mimalloc is at 2.1.7 already while mold ships 2.1.6.

This would also explain why @matoro hasn’t run into this issue.

rui314 commented 1 month ago

Unless -DMOLD_USE_SYSTEM_MIMALLOC=OFF is specified, mold builds the bundled mimalloc and use it even if the system mimalloc is available.

matoro commented 1 month ago

On Gentoo, we do force system mimalloc for mold and our mimalloc is at 2.1.7.