rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.2k stars 468 forks source link

retain, used and perhaps wildcard-matching using retain-symbols-file #1252

Closed fwsGonzo closed 5 months ago

fwsGonzo commented 5 months ago

Hey, I recently tried to use mold in a peculiar build target that requires __attribute__((retain)) support. So, without it I thought perhaps I could add wildcard support to retain-symbols-file, but I see that get_symbol instantiates new symbols, so I am a bit unsure how to proceed.

What is the best way forward in order to support retaining a bunch of similarly named symbols? Is wildcard matching with retain-symbols-file realistic?

I see that there's a retain mechanism in the codebase for symbols as well, but looking at my RISC-V object files, I'm not sure how the retain attribute is actually stored. (Later on I am realizing it's a section attribute)

  8828: 0000000000000000   712 FUNC    GLOBAL DEFAULT  190 stdBuildRope

This function has used and retain.

I tried looking into it and it seems that SHF_GNU_RETAIN is a section-attribute, that prevents linker GC, however it does seem to also prevent retain-symbols from discarding the symbols with GNU ld too. I guess I am relying on that.

The attribute seems to have the value 0x200000.

rui314 commented 5 months ago

__attribute__((retain)) marks a section so that the section will not be garbage-collected by the linker's --gc-sections. On the other hand, --retain-symbols-file makes the linker to keep the specified symbols in the symbol table. They serve different purposes. Could you explain a little bit more about what you are trying to achieve?

fwsGonzo commented 5 months ago

I have a bunch of symbols in my executable that are attribute used, retain in order to not prune them when stripping. This at least works with ld, but I'm not sure exactly why. So, I am just trying to come up with other solutions to retain these symbols automatically. There are quite a few of them, and only an automated solution will work.

My CMake build uses gc-sections, stripping and retain-symbols-file + a few --undefined= for some assembly functions that wouldn't stick otherwise.

I see that --undefined=symbol does work, at least, and I do have a symbol file. It's just not scalable to write down all the symbols one by one. One thing I noticed, unrelated to this issue, is that I had to change -Wl,-u,symbol to -Wl,--undefined=symbol.

What confuses me is that retain is a symbol attribute, yet it's supposed to be applied to a section?

rui314 commented 5 months ago

in order to not prune them when stripping

By stripping, do you mean the strip command or the --gc-sections linker option?

__attribute__((retain)) does not work for symbols, it marks a section referred by the specified symbols to be kept during --gc-sections.

rui314 commented 5 months ago

I took a look at LLVM lld source code and indeed the behavior implemented to lld seems different from what I did to mold. So mold's --retain-symbols-file may be misbehaving. Let me take a look further and get back to you

rui314 commented 5 months ago

Actually it looks like lld's behavior is incompatible with GNU ld, so I filed it as https://github.com/llvm/llvm-project/issues/91055.

That's not directly related to your request, I guess, though.

fwsGonzo commented 5 months ago

Actually it looks like lld's behavior is incompatible with GNU ld, so I filed it as llvm/llvm-project#91055.

That's not directly related to your request, I guess, though.

I think this might be it, and that you hit the nail on the head. I am building static executables that I am using as a low-latency scripting backend for a game server and client. All in all a gargantuan task of creating the emulator, to custom run-times and build systems all the way down to keeping assembly and extern symbols. In your issue you describe ld retaining by storing the symbols in .symtab, and that would indeed be the way that I am looking for those symbols that I want to retain.

So, just to reiterate: Using ld, when I mark a symbol as used, retain, it will still appear in .symtab despite -Wl,-x,-S and even with --gc-sections and --retain-symbols-file.

I don't exactly know the reasoning behind it, but perhaps it's only for static executables? Either way, it solves my problem of being able to make public functions directly in the code, that cannot be stripped.

EDIT: I will test without retain, and see what happens. I tested it, and it is indeed __attribute__((retain)) alone that somehow keeps the symbols from getting stripped. It seems that used is for preventing the compiler from optimizing out the function.

rui314 commented 5 months ago

I'm not sure if I understand your problem correctly. I can do the followings.

Does this what you want?

fwsGonzo commented 5 months ago

I think so - but I can try it out. Where would I make this change in the sources to try it? If it matches ld's behavior, it's likely it solves my problem.

rui314 commented 5 months ago

There are always subtle differences among different linkers, so "just match ld's behavior" isn't something feasible, just like "just creating a browser that match chrome's behavior" isn't feasible. I'd like to know what exactly you want mold to behave to match GNU ld's behavior.

fwsGonzo commented 5 months ago

Minimal example:

long testPruned(void)
{
    return 42;
}

__attribute__((retain))
long testRetained(void)
{
    return 42;
}

void _start() {}

Then compile and link:

gcc-12 -static -O2 -nostdlib -ffunction-sections -fdata-sections -c test.c -o test.o
gcc-12 -static -O2 -nostdlib -Wl,-gc-sections test.o -o test.elf

Only the retained function remains:

$ nm test.elf
0000000000404000 R __bss_start
0000000000404000 R _edata
0000000000404000 R _end
0000000000401010 T _start
0000000000401000 T testRetained

I also tested this with GCC-13 and GCC-14 (with the matching binutils)

I'm not sure if it can be simplified even further, but this at least shows that the retained attribute must do something to either move a symbol into a separate retained section, or give the symbol a flag. I'm guessing the former?

fwsGonzo commented 5 months ago

I was wrong about the retained symbol file. --retain-symbols-file overrides -S and -x and will only leave the symbols listed in the file, except relocations. It says it will leave undefined symbols, but that is not the case.

rui314 commented 5 months ago

We do support the RETAIN section flag. Please try your test case with the mold linker. So far, this bug doesn't seem to be actionable.

fwsGonzo commented 5 months ago

Yep, it was the retain-symbols-file that got overridden on GNU ld, but not mold. I don't really need it so as far as I'm concerned it's all good, and I can use mold now! Thanks, and sorry for the inconvenience. Somehow I thought retain would also apply to the symbols file.