rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.21k stars 468 forks source link

Implement `--icf=safe` #484

Open rui314 opened 2 years ago

rui314 commented 2 years ago

Identical Code Folding (ICF) is a powerful optimization to reduce the size of a linker output. It merges functions that happen to be compiled to the identical machine code that behave exactly the same.

The downside of doing this is the optimization per se violates the specification of the C/C++ language specs. In these languages, taking pointers of two distinctive functions must result in two non-equivalent pointer values. However, if we optimize two distinctive functions into a single function, that resulting two pointers will have the same value.

Currently, mold supports only --icf=all. That instructs the linker to do the optimization anyway.

Other linkers, notably LLVM lld, supports --icf=safe. It uses .llvm_addrsig sections to identify functions that are safe to merge. Functions that are not mentioned in the section are not address-taken (no one takes its pointer), so they are safe to merge.

We should support --icf=safe in the same manner as LLVM lld.

rui314 commented 2 years ago

.llvm_addrsig is currently an LLVM-specific section. If we implement the support of the section to mold, we should propose the same feature to gcc and possibly send a patch to gcc to implement it.

ishitatsuyuki commented 2 years ago

I'll try to implement this next week.

rui314 commented 2 years ago

I filed a feature request to GCC to ask if GCC can support .llvm_addrsig. Let's see how it goes.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105625

rui314 commented 2 years ago

@ishitatsuyuki Are you interested in adding a .addrsig directive support to the GNU assembler?

ishitatsuyuki commented 2 years ago

Are you interested in adding a .addrsig directive support to the GNU assembler?

Sure. Not really familiar with the GNU codebase, but I could certainly try that.

rui314 commented 2 years ago

Nice! Then please go ahead and send a patch to GNU binutils. I'm totally unfamiliar with binutils too. Let me reopen this issue to track this feature request on our side.

ishitatsuyuki commented 2 years ago

I've submitted the first revision of the patch to binutils, and I'm currently addressing the review comments.

One of the topics raised was that .llvm_addrsig's design doesn't work well with objcopy and ld -r:

In my opinion, .llvm_addrsig is a poor design. Peter Collingbourne received multiple objections to the idea when it was proposed on the generic ABI list, but he went ahead anyway. (Well, maybe the code was accepted into llvm during the discussion.) He was even told how to implement the functionality correctly, by a toolchain expert. Quoting Cary: "A much simpler (and zero-cost, from an object file size point of view) solution would be to add FPTR relocations to the psABI, and use those for any references that would impose the address significance constraint." https://groups.google.com/d/msg/generic-abi/MPr8TVtnVn4/g5yBRRB5AAAJ H.J. Lu also agreed that new relocations could be added.

https://sourceware.org/bugzilla/show_bug.cgi?id=23817

It doesn't look like we're receiving strong objection against adding the LLVM extensions in binutils yet, but we need to be aware that adopting .llvm_addrsig might not be the best for various ELF toolchain compatibility.