rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.48k stars 474 forks source link

[FreeBSD] mold output sometimes doesn't work if stripped #456

Open koobs opened 2 years ago

koobs commented 2 years ago

Summary

After upgrading the FreeBSD devel/mimalloc port version from 2.0.5 to 2.0.6 on a recent FreeBSD 14-CURRENT build, mold fails to run, outputting the following on invocation:

[koobs@140-CURRENT-amd64-564d:/usr/home/koobs/repos/freebsd/ports/devel/mimalloc] mold
Mapsize overflow
Mapsize overflow
zsh: exec format error: mold

This appears to be related to a recent freebsd base commit bf83941638 by @kostikbel at the end of last year via a review [1] that is not publicly available

Reproduction Environment / Details

Note

Steps to Reproduce

References

[1] https://reviews.freebsd.org/D33359

rui314 commented 2 years ago

Thank you for your report. Since it's not easy to set up an environment to reproduce the issue, do you mind if I ask you to help me debug this?

  1. Can you get a stacktrace of mold when it crashes?
  2. If 1.1.1 is OK but 1.2.0 isn't, there might be a regression introduced at some point between the two. Can you find it by git bisect?
kostikbel commented 2 years ago

The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.

Find the binary that causes the problem and put it somewhere so that I can take a look at it.

rui314 commented 2 years ago

mold exclusively uses mmap for all file IO, and it can handle multi-gibibyte input files and output files in a few seconds. This usage pattern may be unique.

koobs commented 2 years ago

@rui314 I'll do what I can this end. It may be worth you testing a branch with thirdparty updating mimalloc to 2.0.6 to see if anything interesting comes up in CI for other platforms

Also, I'm not sure yet that it's a crash, the error appears to be FreeBSD's elf handling just abort the load via:

+               uprintf("Mapsize overflow\n");
+               error = ENOEXEC;

I'll run mold under gdb/truss to see if I cant identify anything interesting, but @kostikbel should be able to provide some expert insight

koobs commented 2 years ago

The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.

Find the binary that causes the problem and put it somewhere so that I can take a look at it.

@kostikbel it's reproducible using devel/mold using devel/mimalloc (updated to 2.0.6) today) from ports

kostikbel commented 2 years ago

I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.

koobs commented 2 years ago

I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.

Did you miss my https://github.com/rui314/mold/issues/456#issuecomment-1105895420 ? Install devel/mold from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?

X547 commented 2 years ago

Did you miss my #456 (comment) ? Install devel/mold from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?

Can you provide direct link to download problematic mold executable or upload it somewhere (even attach here if not so big)? Some people may have no access to FreeBSD installation. Inspecting executable may help to identify issue.

"Mapsize overflow" error seems caused by too big ELF program header p_memsz field that cause integer overflow.

rui314 commented 2 years ago

@koobs Did you link mold using mold? If so, the problem might not exist in mimalloc but in the mold executable that links the problematic mold executable.

koobs commented 2 years ago

@rui I'll test both cases (linked with mold, without the issue) and with base lld, and upload binaries here

koobs commented 2 years ago

mold 1.2 linked with mimalloc 2.0.6 linked with mold 1.2 linked with mold 1.2

readelf -p .comment /usr/local/bin/mold

String dump of section '.comment':
  [     1]  FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
  [    64]  mold 1.2.0 (compatible with GNU ld)
/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: /usr/local/bin/mold

File: mold.zip

koobs commented 2 years ago

I can't reproduce with mold 1.2 linked with mimalloc 2.0.6 linked with lld

File: mold-lld.zip

kostikbel commented 2 years ago

This is the excerpt from the program headers dump:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x00000000004c4040 0x00000000004c4040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x0000000000000000
                 0x0000000000000015 0x0000000000000015  R      0x1
      [Requesting program interpreter: /libexec/ld-elf.so.1]
  NOTE           0x00000000000002f8 0x00000000000002f8 0x0000000000000000
                 0x0000000000000048 0x0000000000000048  R      0x4
  LOAD           0x0000000000000000 0x00000000004c4000 0x00000000004c4000
                 0x000000000004d920 0xffffffffffb89920  R      0x1000
...

I cut the output, the last pasted loadable segment is the obvious culprit, it's memsiz is nonsensical. Something is broken in linker.

rui314 commented 2 years ago

@koobs I cannot reproduce it, so it must be a subtle bug. Can you share not only the executable but object files that you use to create that executable? You can simply zip the entire mold directory.

wahjava commented 2 years ago

Hi @rui314,

I maintain the FreeBSD port devel/mold and it seems to happen after the executable is stripped, e.g. on 13.1-RELEASE (amd64) with 1.3.0 (update commit diff: wahjava/freebsd-ports@ac3ce362d2e9eae66dbfc6f60e287de0e56c52bb also contains couple of patches which make it build on FreeBSD which I'm planning to send upstream):

❯ work/stage/usr/local/bin/mold
mold: fatal: -m option is missing
❯ strip work/stage/usr/local/bin/mold
strip: moving loadable section .interp, is this intentional?
strip: moving loadable section .note.tag, is this intentional?
strip: moving loadable section .hash, is this intentional?
strip: moving loadable section .gnu.hash, is this intentional?
strip: moving loadable section .dynsym, is this intentional?
strip: moving loadable section .dynstr, is this intentional?
strip: moving loadable section .gnu.version, is this intentional?
strip: moving loadable section .gnu.version_r, is this intentional?
strip: moving loadable section .rela.dyn, is this intentional?
strip: moving loadable section .rela.plt, is this intentional?
❯ work/stage/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: work/stage/usr/local/bin/mold

The executable work/stage/usr/local/bin/mold was linked with, whereas /usr/local/bin/mold is linked with LLVM LLD 13.0.0:

c++ out/compress.o out/demangle.o out/filepath.o out/glob.o out/hyperloglog.o out/main.o out/multi-glob.o out/perf.o out/strerror.o out/tar.o out/uuid.o out/elf/arch-arm32.o out/elf/arch-arm64.o out/elf/arch-i386.o out/elf/arch-riscv64.o out/elf/arch-x86-64.o out/elf/cmdline.o out/elf/dwarf.o out/elf/gc-sections.o out/elf/icf.o out/elf/input-files.o out/elf/input-sections.o out/elf/linker-script.o out/elf/lto.o out/elf/main.o out/elf/mapfile.o out/elf/output-chunks.o out/elf/passes.o out/elf/relocatable.o out/elf/subprocess.o out/macho/arch-arm64.o out/macho/arch-x86-64.o out/macho/cmdline.o out/macho/dead-strip.o out/macho/input-files.o out/macho/input-sections.o out/macho/lto.o out/macho/main.o out/macho/mapfile.o out/macho/output-chunks.o out/macho/tapi.o out/macho/yaml.o -o mold -pthread -lz -lm -ldl -lmimalloc out/tbb/libs/libtbb.a -L/usr/local/lib -lcrypto -fuse-ld=/usr/local/bin/mold -L/usr/local/lib -Wl,-rpath,/usr/local/lib -fstack-protector-strong

Please let me know if you need more information to get to the bottom of this.

Thanks!

rui314 commented 2 years ago

@wahjava

The executable work/stage/usr/local/bin/mold was linked with, whereas /usr/local/bin/mold is linked with LLVM LLD 13.0.0:

Looks like the word after linked with is missing. Was that linked with mold?

wahjava commented 2 years ago

Sorry for lack of clarity on my part. The command-line I posted is the linking stage command-line and contains -fuse-ld=/usr/local/bin/mold.

rui314 commented 2 years ago

Do you mind if I ask you to build some other program with -fuse-ld=/usr/local/bin/mold, strip the resulting binary and run it to see if the same error occurs?

wahjava commented 2 years ago

@rui314, ofcourse not. Although, I tried a simple hello world program, and wasn't able to reproduce it with that. Anyways, which one would you like me to try ?

rui314 commented 2 years ago

I set up a FreeBSD 13 machine on AWS, build mold using mold on it and stripped the resulting binary. The issue indeed occured. Here is a comparison of the unstripped and stripped binaries.

--- /tmp//sh-np.ADiUGu  2022-06-19 11:33:40.654452000 +0000
+++ /tmp//sh-np.KoLSrN  2022-06-19 11:33:40.659622000 +0000
@@ -3,116 +3,106 @@
   Class:                             ELF64
   Data:                              2's complement, little endian
   Version:                           1 (current)
   OS/ABI:                            NONE
   ABI Version:                       0
   Type:                              EXEC (Executable file)
   Machine:                           Advanced Micro Devices x86-64
   Version:                           0x1
   Entry point address:               0x209fd0
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          10585472 (bytes into file)
+  Start of section headers:          5168200 (bytes into file)
   Flags:                             0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
   Number of program headers:         12
   Size of section headers:           64 (bytes)
-  Number of section headers:         49
-  Section header string table index: 37
+  Number of section headers:         39
+  Section header string table index: 36

 Elf file type is EXEC (Executable file)
 Entry point 0x209fd0
 There are 12 program headers, starting at offset 64

 Program Headers:
   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
   PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0002a0 0x0002a0 R   0x8
-  INTERP         0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000015 0x000015 R   0x1
-      [Requesting program interpreter: /libexec/ld-elf.so.1]
-  NOTE           0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000048 0x000048 R   0x4
-  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x008898 0x008898 R   0x1000
+  INTERP         0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000035 0x000015 R   0x1
+      [Requesting program interpreter: ]
+  NOTE           0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000068 0x000048 R   0x4
+  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x0088b8 0x008898 R   0x1000
   LOAD           0x009000 0x0000000000209000 0x0000000000209000 0x42f41c 0x42f41c R E 0x1000
   LOAD           0x439000 0x0000000000639000 0x0000000000639000 0x08a260 0x08a260 R   0x1000
-  LOAD           0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x088479 RW  0x1000
-  TLS            0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000119 RW  0x10
+  LOAD           0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x026f28 RW  0x1000
+  TLS            0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000008 RW  0x10
   DYNAMIC        0x4c4150 0x00000000006c4150 0x00000000006c4150 0x000270 0x000270 RW  0x8
   GNU_EH_FRAME   0x4415cc 0x00000000006415cc 0x00000000006415cc 0x0019ec 0x0019ec R   0x4
   GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
-  GNU_RELRO      0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x003000 R   0x40
+  GNU_RELRO      0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x0020c1 R   0x40

As you can see, strip messed up the INTERP segment. Isn't it an issue of FreeBSD's strip command? Nothing seems to be obviously wrong with the mold's output, and the output works if we do not strip it.

wahjava commented 2 years ago

I tried the test again with binutils' strip (GNU strip (GNU Binutils) 2.37) this time, and that spared the executable, so indeed something with FreeBSD's strip (strip (elftoolchain r3769)):

--- mold.pre    2022-06-19 12:29:06.778384000 +0000
+++ mold.post   2022-06-19 12:29:31.152217000 +0000
@@ -10,14 +10,14 @@
   Version:                           0x1
   Entry point address:               0x20a000
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          108098168 (bytes into file)
+  Start of section headers:          10577632 (bytes into file)
   Flags:                             0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
   Number of program headers:         12
   Size of section headers:           64 (bytes)
-  Number of section headers:         48
-  Section header string table index: 36
+  Number of section headers:         38
+  Section header string table index: 37

 Elf file type is EXEC (Executable file)
 Entry point 0x20a000
@@ -41,7 +41,7 @@
                  0x0000000000083150 0x0000000000083150  R      0x1000
   LOAD           0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
                  0x0000000000023e48 0x000000000002acf9  RW     0x1000
-  TLS            0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
+  TLS            0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
                  0x0000000000000000 0x0000000000000108  RW     0x10
   DYNAMIC        0x00000000009b2128 0x0000000000bb2128 0x0000000000bb2128
                  0x0000000000000280 0x0000000000000280  RW     0x8
@@ -49,8 +49,8 @@
                  0x00000000000010c4 0x00000000000010c4  R      0x4
   GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                  0x0000000000000000 0x0000000000000000  RW     0
-  GNU_RELRO      0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
-                 0x0000000000000e81 0x0000000000001000  R      0x40
+  GNU_RELRO      0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
+                 0x0000000000001000 0x0000000000001000  R      0x40
kostikbel commented 2 years ago

@emaste

aokblast commented 10 months ago

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

wahjava commented 10 months ago

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.

aokblast commented 10 months ago

Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!

Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.

Sorry, I add you now. It is the first (or second?) time I work on port. I am not familiar with it.