riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
663 stars 158 forks source link

Define GOT-Relative data relocation #399

Closed PiJoules closed 7 months ago

PiJoules commented 9 months ago

We would like to introduce a new relocation type that works similarly to x86’s R_X86_64_GOTPCREL relocation.

In the “GOT-relative data relocations” section of riscv-elf.adoc, we would add a new type “R_RISCV_GOT32_PCREL” that follows the existing wording to “R_RISCV_32_PCREL”, but instead evaluates to the 32-bit offset between a GOT entry for a given symbol and the current location where the relocation is applied, so its equation would be “G + GOT - P + A”.

The purpose of this relocation is to remove what we call “rtti_proxies” in the relative vtables C++ ABI. This ABI saves memory by moving vtables into .rodata by replacing each of the relocations that would constitute the vtable (R_RISCV_64, R_AARCH64_ABS64, R_X86_64_64) with PC-relative offsets, requiring no dynamic relocations. Instead of placing a pointer to each virtual method into the vtable, we would instead take the offset between the vtable and the virtual function which can be statically computed if the compiler knows both symbols are DSO-local. The same is done with the pointer to the RTTI object, but an issue here is the RTTI is not guaranteed to be DSO-local, so we must emit a DSO-local “proxy” symbol which points to the actual RTTI object, and take the offset to the proxy instead. This proxy is equivalent to a GOT entry, but in order to use the GOT entry, we’d need a relocation that points to the entry itself rather than just the start of the GOT.

The proxies work in the meantime, but this new relocation would allow us to effectively “move” those proxies to the GOT which will provide these benefits:

  1. It will reuse the same GOT slot across translation units without COMDAT machinery for each proxy, which reduces clutter and bloat in object files and streamlines linking by using simpler features with less bulk in the input files.
  2. It will reuse the same GOT slot as a non-vtable reference to the same symbol would.
  3. The GOT is tightly packed in the RELRO segment, whereas the proxy symbols will be placed in arbitrary parts of the .data.rel.ro section.

This could also be used for other PIC-friendly optimizations we’d like to implement that aren’t specific to vtables.

We have a WIP patch that implements this for x86_64, but does not yet work for RISC-V. What steps can we take to add this relocation?

rui314 commented 9 months ago

It sounds like a reasonable addition to the psABI. I'd make a change to https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc and send a pull request.

PiJoules commented 9 months ago

Submitted https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/402

kito-cheng commented 9 months ago

Sorry for late reply, just back from GNU tools cauldron, I think generally it's OK to add new relocation which is useful, especially they are already accepted by other architecture, anyway, just need to make sure our linker friends are also happy with that :P

MaskRay commented 8 months ago

Can you add an example with detailed code sequences how R_RISCV_GOT32_PCREL is going to be useful?

PiJoules commented 7 months ago

Can you add an example with detailed code sequences how R_RISCV_GOT32_PCREL is going to be useful?

The immediate use will be for the relative vtables c++ ABI. Currently a relative vtable will look like this:

  .section .rodata.vtable
vtable:
  .word 0  // offset to top
  .word vtable.rtti.proxy - (vtable+8)  // offset between dso-local proxy symbol and first vtable func
  .word foo@PLT - (vtable+8)  // offset between PLT entry for foo and first vtable func

  .section .data.relro.vtable.rtti_proxy
  .hidden vtable.rtti_proxy
vtable.rtti_proxy:
  .quad vtable.rtti  // pointer to the actual rtti struct

Accesses to the rtti struct are done using a relative load to get the address of vtable.rtti_proxy followed by a load to get vtable.rtti:

        ld      a1, 0(a0)  // load vtable from an object (a1 points to the 3rd entry in the vtable which is the first virtual function)
        lw      a2, -4(a1)  // load 32-bit offset (vtable.rtti.proxy - (vtable+8))
        add     a1, a1, a2  // add the offset back to the vtable address to get `vtable.rtti.proxy `
        ld      a1, 0(a1)  // deref to get `vtable.rtti`

The proxy is needed to ensure the vtable contains only static relocations. Functionally, this proxy symbol acts the same as the GOT, so the offset calculation between the proxy and PC can instead be the offset between the GOT entry and PC. This is similar to x86's R_X86_64_GOTPCREL and can be denoted with @GOTPCREL. So the new relative vtable layout would be:

  .section .rodata.vtable
vtable:
  .word 0  // offset to top
  .word vtable.rtti@GOTPCREL-4  // offset between GOT entry for vtable.rtti and first vtable func
  .word foo@PLT - (vtable+8)  // offset between PLT entry for foo and first vtable func

and the instructions for accessing the rtti object will be the same since the GOT will hold the address of vtable.rtti. vtable.rtti@GOTPCREL-4 will resolve to a R_RISCV_GOT32_PCREL reloc.

MaskRay commented 7 months ago

Thanks for the description. I was thinking that #402 could use a brief summary of the goal.

Using linker-managed GOT is indeed better than a manual construct.

https://maskray.me/blog/2023-02-26-linker-notes-on-power-isa

Another difference is the explicit mention of .toc. This scheme gives the compiler control within the translation unit. With the traditional GOT scheme, input files do not mention .got. The compiler does not control how the linker will layout .got. Well, I disagree with the presumed advantage of .toc: the compiler does not know the global information, and the translation unit local layout may not be ideal. A linker is better placed to do such link-time optimization.

I think another advantage introducing the data relocation is that: -shared -fpic -fexperimental-relative-c++-abi-vtables builds will not create PLT entries for functions in the vtables, as long as they are not directly called.

kito-cheng commented 7 months ago

Is https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/402 address the request? If so, I gonna close this issue?

MaskRay commented 7 months ago

Is https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/402 address the request? If so, I gonna close this issue?

It does and we can close the issue now. I am waiting on @PiJoules's AArch64 side changes of LLVMMC/lld, then the RISC-V changes should be straightforward:)