rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.15k stars 464 forks source link

Support DT_RELR using standard -z pack-relative-relocs cmdline flag #653

Closed rawoul closed 1 year ago

rawoul commented 2 years ago

In bfd and gold, packed dynamic relocations are toggled with the -z pack-relative-relocs and -z nopack-relative-relocs cmd line flags, while mold uses -pack-dyn-relocs=none|relr. For compatibility it would be simpler if mold aligned to the other linkers, can the standard flag also be supported, or used instead of -pack-dyn-relocs ?

rui314 commented 2 years ago

If someone really has a trouble with these flags, I'll consider, but I don't think we should add a new flag for a hypothetical use. It's somewhat an evolutionary process; if a flag doesn't become popular enough, it won't be supported by many implementations, and its use will diminish. So I don't think we should try to support all flags beforehand.

rawoul commented 2 years ago

When adding -z pack-relative-relocs bfd also adds a version dependency on GLIBC_ABI_DT_RELR if libc.so is in DT_NEEDED and there's a dependency on version GLIBC_2.* (meaning the object depends on glibc). Without this the glibc dynamic loader will refuse to load a shared object using DT_RELR with the following error:

DT_RELR without GLIBC_ABI_DT_RELR dependency

That seems quite complicated, so I agree with you that waiting is probably better... For context, the same issue has been raised before in the lld project: https://github.com/llvm/llvm-project/issues/53775.

rui314 commented 2 years ago

Thank you for sharing the link! I agree with you that what GNU ld does seems quite complicated.

glandium commented 1 year ago

b365d6ba1aade2e99d636892e4a21b8d7988a01c added -z pack-relative-relocs without GLIBC_ABI_DT_RELR, which leads to the error mentioned in https://github.com/rui314/mold/issues/653#issuecomment-1221664798 ...

maxyvisser commented 1 year ago

I was experiencing the error mentioned in https://github.com/rui314/mold/issues/653#issuecomment-1221664798 in the 2.0.0 release build when building firefox 116. This issue appears to have been fixed by https://github.com/rui314/mold/commit/f467ad1add2ab6e381e0e458f026df197e63d487, but a new error pops up:

20:10.56 mold: fatal: /tmp/lto-llvm-5abe96.o: REL-type relocation table is not supported for this target
20:10.58 clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
20:10.59 make[4]: *** [/compile/makepkg/librewolf/src/librewolf-116.0-1/config/rules.mk:532: libmozsqlite3.so] Error 1
20:10.59 make[3]: *** [/compile/makepkg/librewolf/src/librewolf-116.0-1/config/recurse.mk:72: config/external/sqlite/target] Error 2

Could this possibly be related?

PS: here is a link to a report on the same bug on firefox's issue tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1847697.

rui314 commented 1 year ago

No, that's different. What target did you get this error for? If it's for x86-64 or ARM64, it is actually odd if an object file contains a REL-type relocation section instead of RELA-type one.

maxyvisser commented 1 year ago

This is on x86-64, a ryzen 5 5600G to be exact. Do you want me to create a new issue, since it is unrelated?

ambasta commented 1 year ago

Can reproduce this issue on FF-117 as well

171:11.18 mold: fatal: /var/tmp/portage/www-client/firefox-117.0/temp/lto-llvm-715b02.o: REL-type relocation table is not supported for this target
rui314 commented 1 year ago

It looks like LLVM generates REL-type relocations instead of RELA-type ones for LLVM. It's very likely that it's a bug in LLVM. Can you report it to LLVM?

ambasta commented 1 year ago

Sure, in the process of rebuilding manually (w/o ebuild) to isolate and confirm the issue, Will file one once I have confirmation

ambasta commented 11 months ago

Is this really a bug w/ LLVM?

Compiling w/ LLVM succeeds

clang version 17.0.2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
System configuration file directory: /etc/clang
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/x86_64-pc-linux-gnu-ld.bfd" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -shared -o libmozsqlite3.so /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/13 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/lib -L/lib -L/usr/lib -plugin /usr/lib/llvm/17/bin/../lib64/LLVMgold.so -plugin-opt=mcpu=znver1 -plugin-opt=O3 -plugin-opt=thinlto -plugin-opt=-function-sections=1 -plugin-opt=-data-sections=1 -z relro -z defs --gc-sections -h libmozsqlite3.so /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/config/external/sqlite/libmozsqlite3_so.list -plugin-opt=-import-instr-limit=10 -plugin-opt=-import-hot-multiplier=30 -lpthread -O1 --as-needed --compress-debug-sections=zlib -rpath=/usr/lib64/firefox --enable-new-dtags -z pack-relative-relocs -z noexecstack -z text -z relro -z nocopyreloc -Bsymbolic-functions -rpath-link /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/dist/bin -rpath-link /usr/lib --version-script libmozsqlite3.so.symbols -lm -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtendS.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crtn.o

But on adding -fuse-ld=mold

clang version 17.0.2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
System configuration file directory: /etc/clang
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/ld.mold" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -shared -o libmozsqlite3.so /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/13 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/lib -L/lib -L/usr/lib -plugin /usr/lib/llvm/17/bin/../lib64/LLVMgold.so -plugin-opt=mcpu=znver1 -plugin-opt=O3 -plugin-opt=thinlto -plugin-opt=-function-sections=1 -plugin-opt=-data-sections=1 -z relro -z defs --gc-sections -h libmozsqlite3.so /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/config/external/sqlite/libmozsqlite3_so.list -plugin-opt=-import-instr-limit=10 -plugin-opt=-import-hot-multiplier=30 -lpthread -O1 --as-needed --compress-debug-sections=zlib -rpath=/usr/lib64/firefox --enable-new-dtags -z pack-relative-relocs -z noexecstack -z text -z relro -z nocopyreloc -Bsymbolic-functions -rpath-link /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/dist/bin -rpath-link /usr/lib --version-script libmozsqlite3.so.symbols -lm -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtendS.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crtn.o
mold: fatal: /tmp/lto-llvm-be7713.o: REL-type relocation table is not supported for this target
rui314 commented 11 months ago

Yes, it's a bug in LLVM, please report it to LLVM. x86-64 is defined to use RELA-type relocations by its psABI, so creating a REL-type relocation type is illegal even if other linkers happen to not complain.

ambasta commented 11 months ago

@rui314 Some additional guidance please, looking at https://github.com/llvm/llvm-project/blob/631e2911ea298bc12564df8acd16bba89790426a/lld/test/ELF/relocation-none.test#L38

## Both REL and RELA are supported. .rel.llvm.call-graph-profile uses REL even
## if the prevailing format is RELA.

This is exactly where mold is failing, and seems to be documented behavior

rui314 commented 11 months ago

Even if it's documented, it still violates the spec. As quoted from x86-64 psABI p.72, "The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation entries with explicit addends." While they have the option not to conform to the standard, doing so would lead to compatibility issues for obvious reasons.

ambasta commented 11 months ago

Yes, but that would be unlikely. Even the kernel has patches to ignore modpost checks on call-graph-profile

https://lore.kernel.org/lkml/CAK7LNARVi1sfBjv5a5OoQWPEeM-6bFuwPJE+i32NC=wdum-AKw@mail.gmail.com/t/#mcd82d7d511dcb4fe8939e21127fafda08f4f732e

And here's the rationale behind it

https://reviews.llvm.org/D104080

rui314 commented 11 months ago

Alright, how about we simply ignore the REL relocation tables in psABIs where RELA is mandated? It seems that by doing so, we'll resolve the issue.

ambasta commented 11 months ago

I genuinely do not understand any of this enough to comment. This issue was my first attempt at even trying to go through a compiler/linker source, so I'm not sure what the correct approach is here.

rui314 commented 11 months ago

I believe ignoring a relocation table with an incompatible type should be fine because in this case all relocations in the incompatible table is R_*_NONE type, which is a no-op relocation. mold used to silently ignore incompatible relocation tables, so it's not new.