Open koobs opened 2 years ago
Thank you for your report. Since it's not easy to set up an environment to reproduce the issue, do you mind if I ask you to help me debug this?
git bisect
?The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.
Find the binary that causes the problem and put it somewhere so that I can take a look at it.
mold exclusively uses mmap for all file IO, and it can handle multi-gibibyte input files and output files in a few seconds. This usage pattern may be unique.
@rui314 I'll do what I can this end. It may be worth you testing a branch with thirdparty updating mimalloc to 2.0.6 to see if anything interesting comes up in CI for other platforms
Also, I'm not sure yet that it's a crash, the error appears to be FreeBSD's elf handling just abort the load via:
+ uprintf("Mapsize overflow\n");
+ error = ENOEXEC;
I'll run mold under gdb/truss to see if I cant identify anything interesting, but @kostikbel should be able to provide some expert insight
The 'Mapsize overflow' error comes from the image activator. Basically, for some binary, the total size of the segments to mmap is too large.
Find the binary that causes the problem and put it somewhere so that I can take a look at it.
@kostikbel it's reproducible using devel/mold using devel/mimalloc (updated to 2.0.6) today) from ports
I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.
I need a binary that causes the problem. It happens during execve(2) of the binary, due to some peculiarity in the binary format, which is rejected by the in-kernel image activator. It does not occurs during runtime, because runtime simply does not happen.
Did you miss my https://github.com/rui314/mold/issues/456#issuecomment-1105895420 ? Install devel/mold
from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?
Did you miss my #456 (comment) ? Install
devel/mold
from latest ports tree. That is the binary that is triggering the error, or is that not appropriate for testing this case?
Can you provide direct link to download problematic mold
executable or upload it somewhere (even attach here if not so big)? Some people may have no access to FreeBSD installation. Inspecting executable may help to identify issue.
"Mapsize overflow" error seems caused by too big ELF program header p_memsz
field that cause integer overflow.
@koobs Did you link mold using mold? If so, the problem might not exist in mimalloc but in the mold executable that links the problematic mold executable.
@rui I'll test both cases (linked with mold, without the issue) and with base lld, and upload binaries here
mold 1.2 linked with mimalloc 2.0.6 linked with mold 1.2 linked with mold 1.2
readelf -p .comment /usr/local/bin/mold
String dump of section '.comment':
[ 1] FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)
[ 64] mold 1.2.0 (compatible with GNU ld)
/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: /usr/local/bin/mold
File: mold.zip
I can't reproduce with mold 1.2 linked with mimalloc 2.0.6 linked with lld
File: mold-lld.zip
This is the excerpt from the program headers dump:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x00000000004c4040 0x00000000004c4040
0x00000000000002a0 0x00000000000002a0 R 0x8
INTERP 0x00000000000002e0 0x00000000000002e0 0x0000000000000000
0x0000000000000015 0x0000000000000015 R 0x1
[Requesting program interpreter: /libexec/ld-elf.so.1]
NOTE 0x00000000000002f8 0x00000000000002f8 0x0000000000000000
0x0000000000000048 0x0000000000000048 R 0x4
LOAD 0x0000000000000000 0x00000000004c4000 0x00000000004c4000
0x000000000004d920 0xffffffffffb89920 R 0x1000
...
I cut the output, the last pasted loadable segment is the obvious culprit, it's memsiz is nonsensical. Something is broken in linker.
@koobs I cannot reproduce it, so it must be a subtle bug. Can you share not only the executable but object files that you use to create that executable? You can simply zip the entire mold directory.
Hi @rui314,
I maintain the FreeBSD port devel/mold
and it seems to happen after the executable is stripped, e.g. on 13.1-RELEASE (amd64) with 1.3.0 (update commit diff: wahjava/freebsd-ports@ac3ce362d2e9eae66dbfc6f60e287de0e56c52bb also contains couple of patches which make it build on FreeBSD which I'm planning to send upstream):
❯ work/stage/usr/local/bin/mold
mold: fatal: -m option is missing
❯ strip work/stage/usr/local/bin/mold
strip: moving loadable section .interp, is this intentional?
strip: moving loadable section .note.tag, is this intentional?
strip: moving loadable section .hash, is this intentional?
strip: moving loadable section .gnu.hash, is this intentional?
strip: moving loadable section .dynsym, is this intentional?
strip: moving loadable section .dynstr, is this intentional?
strip: moving loadable section .gnu.version, is this intentional?
strip: moving loadable section .gnu.version_r, is this intentional?
strip: moving loadable section .rela.dyn, is this intentional?
strip: moving loadable section .rela.plt, is this intentional?
❯ work/stage/usr/local/bin/mold
Mapsize overflow
zsh: exec format error: work/stage/usr/local/bin/mold
The executable work/stage/usr/local/bin/mold
was linked with, whereas /usr/local/bin/mold
is linked with LLVM LLD 13.0.0
:
c++ out/compress.o out/demangle.o out/filepath.o out/glob.o out/hyperloglog.o out/main.o out/multi-glob.o out/perf.o out/strerror.o out/tar.o out/uuid.o out/elf/arch-arm32.o out/elf/arch-arm64.o out/elf/arch-i386.o out/elf/arch-riscv64.o out/elf/arch-x86-64.o out/elf/cmdline.o out/elf/dwarf.o out/elf/gc-sections.o out/elf/icf.o out/elf/input-files.o out/elf/input-sections.o out/elf/linker-script.o out/elf/lto.o out/elf/main.o out/elf/mapfile.o out/elf/output-chunks.o out/elf/passes.o out/elf/relocatable.o out/elf/subprocess.o out/macho/arch-arm64.o out/macho/arch-x86-64.o out/macho/cmdline.o out/macho/dead-strip.o out/macho/input-files.o out/macho/input-sections.o out/macho/lto.o out/macho/main.o out/macho/mapfile.o out/macho/output-chunks.o out/macho/tapi.o out/macho/yaml.o -o mold -pthread -lz -lm -ldl -lmimalloc out/tbb/libs/libtbb.a -L/usr/local/lib -lcrypto -fuse-ld=/usr/local/bin/mold -L/usr/local/lib -Wl,-rpath,/usr/local/lib -fstack-protector-strong
Please let me know if you need more information to get to the bottom of this.
Thanks!
@wahjava
The executable work/stage/usr/local/bin/mold was linked with, whereas /usr/local/bin/mold is linked with LLVM LLD 13.0.0:
Looks like the word after linked with
is missing. Was that linked with mold?
Sorry for lack of clarity on my part. The command-line I posted is the linking stage command-line and contains -fuse-ld=/usr/local/bin/mold
.
Do you mind if I ask you to build some other program with -fuse-ld=/usr/local/bin/mold
, strip the resulting binary and run it to see if the same error occurs?
@rui314, ofcourse not. Although, I tried a simple hello world program, and wasn't able to reproduce it with that. Anyways, which one would you like me to try ?
I set up a FreeBSD 13 machine on AWS, build mold using mold on it and stripped the resulting binary. The issue indeed occured. Here is a comparison of the unstripped and stripped binaries.
--- /tmp//sh-np.ADiUGu 2022-06-19 11:33:40.654452000 +0000
+++ /tmp//sh-np.KoLSrN 2022-06-19 11:33:40.659622000 +0000
@@ -3,116 +3,106 @@
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: NONE
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices x86-64
Version: 0x1
Entry point address: 0x209fd0
Start of program headers: 64 (bytes into file)
- Start of section headers: 10585472 (bytes into file)
+ Start of section headers: 5168200 (bytes into file)
Flags: 0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 12
Size of section headers: 64 (bytes)
- Number of section headers: 49
- Section header string table index: 37
+ Number of section headers: 39
+ Section header string table index: 36
Elf file type is EXEC (Executable file)
Entry point 0x209fd0
There are 12 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x0002a0 0x0002a0 R 0x8
- INTERP 0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000015 0x000015 R 0x1
- [Requesting program interpreter: /libexec/ld-elf.so.1]
- NOTE 0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000048 0x000048 R 0x4
- LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x008898 0x008898 R 0x1000
+ INTERP 0x0002e0 0x00000000002002e0 0x00000000002002e0 0x000035 0x000015 R 0x1
+ [Requesting program interpreter: ]
+ NOTE 0x0002f8 0x00000000002002f8 0x00000000002002f8 0x000068 0x000048 R 0x4
+ LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x0088b8 0x008898 R 0x1000
LOAD 0x009000 0x0000000000209000 0x0000000000209000 0x42f41c 0x42f41c R E 0x1000
LOAD 0x439000 0x0000000000639000 0x0000000000639000 0x08a260 0x08a260 R 0x1000
- LOAD 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x088479 RW 0x1000
- TLS 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000119 RW 0x10
+ LOAD 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x026f28 0x026f28 RW 0x1000
+ TLS 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x000008 0x000008 RW 0x10
DYNAMIC 0x4c4150 0x00000000006c4150 0x00000000006c4150 0x000270 0x000270 RW 0x8
GNU_EH_FRAME 0x4415cc 0x00000000006415cc 0x00000000006415cc 0x0019ec 0x0019ec R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
- GNU_RELRO 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x003000 R 0x40
+ GNU_RELRO 0x4c4000 0x00000000006c4000 0x00000000006c4000 0x0020c1 0x0020c1 R 0x40
As you can see, strip messed up the INTERP segment. Isn't it an issue of FreeBSD's strip command? Nothing seems to be obviously wrong with the mold's output, and the output works if we do not strip it.
I tried the test again with binutils' strip (GNU strip (GNU Binutils) 2.37
) this time, and that spared the executable, so indeed something with FreeBSD's strip (strip (elftoolchain r3769)
):
--- mold.pre 2022-06-19 12:29:06.778384000 +0000
+++ mold.post 2022-06-19 12:29:31.152217000 +0000
@@ -10,14 +10,14 @@
Version: 0x1
Entry point address: 0x20a000
Start of program headers: 64 (bytes into file)
- Start of section headers: 108098168 (bytes into file)
+ Start of section headers: 10577632 (bytes into file)
Flags: 0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 12
Size of section headers: 64 (bytes)
- Number of section headers: 48
- Section header string table index: 36
+ Number of section headers: 38
+ Section header string table index: 37
Elf file type is EXEC (Executable file)
Entry point 0x20a000
@@ -41,7 +41,7 @@
0x0000000000083150 0x0000000000083150 R 0x1000
LOAD 0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
0x0000000000023e48 0x000000000002acf9 RW 0x1000
- TLS 0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
+ TLS 0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
0x0000000000000000 0x0000000000000108 RW 0x10
DYNAMIC 0x00000000009b2128 0x0000000000bb2128 0x0000000000bb2128
0x0000000000000280 0x0000000000000280 RW 0x8
@@ -49,8 +49,8 @@
0x00000000000010c4 0x00000000000010c4 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0
- GNU_RELRO 0x0000000000000000 0x0000000000bb2000 0x0000000000bb2000
- 0x0000000000000e81 0x0000000000001000 R 0x40
+ GNU_RELRO 0x00000000009b2000 0x0000000000bb2000 0x0000000000bb2000
+ 0x0000000000001000 0x0000000000001000 R 0x40
@emaste
Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!
Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!
Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.
Hello, I have reworked on mold port in this patch. And I think I have been fix this issue in the recent commit. Please help me test on this issue to confirm their is no any other problem now!
Thanks @aokblast. I wonder why didn't you cc the maintainer in the differential ? I can test it later.
Sorry, I add you now. It is the first (or second?) time I work on port. I am not familiar with it.
Summary
After upgrading the FreeBSD
devel/mimalloc
port version from2.0.5
to2.0.6
on a recent FreeBSD 14-CURRENT build, mold fails to run, outputting the following on invocation:This appears to be related to a recent freebsd base commit bf83941638 by @kostikbel at the end of last year via a review [1] that is not publicly available
Reproduction Environment / Details
Note
1.2
with mimalloc2.0.5
1.1.1
with mimalloc2.0.6
Steps to Reproduce
1.2
with mimalloc2.0.6
on FreeBSD CURRENTmold
commandReferences
[1] https://reviews.freebsd.org/D33359