rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.08k stars 461 forks source link

Linker script syntax support #1055

Open alecGraves opened 1 year ago

alecGraves commented 1 year ago

The mold docs say:

Mold is designed to be a drop-in replacement for the GNU linkers for linking user-land programs. If your user-land program cannot be built due to missing command-line options, please file a bug at https://github.com/rui314/mold/issues.

Mold supports a very limited set of linker script features, which is just sufficient to read /usr/lib/x86_64-linux-gnu/libc.so on Linux systems (on Linux, that file is contrary to its name not a shared library but an ASCII linker script that loads a real libc.so file.)

Beyond that, we have no plan to support any additional linker script features. The linker script is an ad-hoc, over-designed, complex language which we believe needs to be replaced by a simpler mechanism. We have a plan to add a replacement for the linker script to mold instead.

However, I would like to use mold with a Hare language project. Hare has a small, custom runtime and uses a linker script (link at the time of this writing) with the tokens PHDRS, ENTRY, and SECTIONS and the section keywords KEEP and PROVIDE_HIDDEN.

It appears that none of the these linker script tokens are supported by mold, resulting in the following error when I try to use mold to build my program:

$ export LDLINKFLAGS="--static -gc-sections --as-needed --strip-all"
$ export LD=mold
$ hare build -o haremain main.ha
mold: fatal: /usr/local/src/hare/stdlib/rt/hare.sc:1: PHDRS {
                                               ^ unknown linker script token
Error: mold: exited with status 1
hare build: build failed

Whereas other linkers support these linker script keywords:

$ export LD=ld.lld
$ hare build -o haremain main.ha
$ ./haremain
¡Hola Mundo!
Γειά σου Κόσμε!
Привіт, світ!
こんにちは世界!
$ export LD=gold
$ hare build -o haremain main.ha
$ ./haremain
¡Hola Mundo!
Γειά σου Κόσμε!
Привіт, світ!
こんにちは世界!

I would really like to use mold in my project as it is much simpler and faster linker. Is there any room on the roadmap for adding support for more linker script keywords?

There is also #563 asking for examples of linker scripts for OS kernels so that an alternative language to linker script can be made, but I would probably not want to modify the Hare compiler driver to add support for a new linker language (ok, I might consider it). Anyway, I would rather suggest adding some basic linker script support to mold even if the creation of a superior alternative is in progress. Perhaps tooling could be created to parse linker scripts and convert them into whatever the new format is.

Linker script manual

rui314 commented 1 year ago

I have added a few command line options as an attempt to cover the usage of the linker script. You can see an example in this test file: https://github.com/rui314/mold/blob/main/test/elf/section-order.sh

I'm not sure if this option is powerful enough for your usage, though. The lack of knowledge of the real-world linker script usage hinders me from developing the features. So thank you for your request.

I wonder why you needed the linker script in the first place. It looks like your project is a programming language which doesn't usually need a linker script. What's special about your project?

alecGraves commented 1 year ago

I think the linker script is included in the runtime module to make it easier to port the runtime to custom systems and build OS's with hare since that was one of the language's design goals.

On the default platform, I think you are correct that it is not needed, and I appear to be able to link with mold manually (without using the build driver hare build) by omitting -T .

mold --static --gc-sections --as-needed --strip-all --relax --icf=all -o haremain ./haremain.o $(find ~/.cache/hare/ | grep '\.o$') ~/.cache/hare/rt/*.a
$ ./haremain
Hello, world!
¡Hola Mundo!
Γειά σου Κόσμε!
Привіт, світ!
こんにちは世界!

My hello world binary is also smaller with mold. Neat.

#ld
-rwxr-xr-x 1 alec alec  117712 Jul  2 07:54 haremain

# mold
-rwxr-xr-x 1 alec alec  110968 Jul  2 07:55 haremain

The current linker script appears to just move .text and .data to specific addresses and set KEEP for a couple of sections, so I guess this is unnecessary/implied for standard Linux linking.

rui314 commented 1 year ago

If your program works without the linker script, it is generally a good idea to remove the use of the linker script because the linker has a better idea as to what is the best memory layout for your program. By enforcing some particular order or memory addresses, you could lose security features such as address randomization or RELRO.

alecGraves commented 1 year ago

Hare's compiler backend (QBE) currently does not support ASLR, so there is no loss from that. Hare also has stronger runtime checks than C by default, so things like buffer overruns are less likely.

I am not sure what the performance implications of pinning .text and .data sections to specific addresses are. Maybe it could hurt. Does the linker normally scatter text and data throughout the program to help with data locality? Or would it normally break up .text and .data into two separate sections anyway?

rui314 commented 1 year ago

Ah, I didn't know that QBE doesn't support position-independent code, but it looks like it can't do that indeed.

Memory layout can affect program's performance, and there's a tool to optimize code based on profiling (https://github.com/llvm/llvm-project/tree/main/bolt). But that's not a linker. Linker is not able to do that level of sophisticated optimization and just laid out text and code as two separate sections.

jpalus commented 11 months ago

Note that systemd is using linker script too for building EFI stubs: https://github.com/systemd/systemd/blob/main/tools/elf2efi.lds

jpalus commented 10 months ago

Note that systemd is using linker script too for building EFI stubs: https://github.com/systemd/systemd/blob/main/tools/elf2efi.lds

...and it no longer does: https://github.com/systemd/systemd/commit/142f0c61a37091e233b80f02375cff1114dab24a

vChavezB commented 3 months ago

I mentioned this in the discussion here https://github.com/rui314/mold/discussions/1261. But wanted to add this also here.

For Embedded Systems (i.e., microcontrollers), linker scripts are used among other things to set the Memory layout for the executable file. The memory regions are defined by the manufacturer in their datasheet (SRAM, FLASH, Peripherals), hence the linker has to parse these addresses.

Examples of frameworks/toolchains that use the Memory token are Zephyr RTOS, STM32 Cube IDE, IAR Embedded Workbench, S32 Design Studio IDE, and so on.