riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
691 stars 163 forks source link

Add RELOCID for vendor-specific relocations #423

Closed MaelCravero closed 1 month ago

MaelCravero commented 8 months ago

This expands the current support for custom relocations to have a vendor identifier which would allow to avoid conflicts between vendor relocations in tools. This would also allow a greater number of relocations, such that these could eventually be supported in upstream tools.

A proof of concept implementation can be found in https://github.com/embecosm/corev-binutils-gdb-vendor-relocs.

kito-cheng commented 8 months ago

Thanks for raising this! I think this solution is kinda discussed for a while, and also agreed from many different parties, I guess the only only remaining controversy is the vendor ID: build our own list or use JEDEC manufacturer ID like mvendorid.

Added this into today's psABI meeting agenda, thanks again :)

simonpcook commented 8 months ago

Using the addend precludes the use of Elf_Rel, which some platforms (Fuschia) use for dynamic relocations. It needs to be encoded in the symbol instead.

Is this suggesting encoding in the symbol in all cases, or being generic enough that Sym (if Elf_Rel) or Sym+Addend (if Elf_Rela; probably the absolute section symbol) results in the "vendor identifier"? I understand the need to cover the Elf_Rel case if we have users that need that (and forbidding such functionality because you don't have addends in your ELF file makes no sense), but I'm not sure of an immediate strong argument of always adding symbols to hold this information if addends are available

kito-cheng commented 7 months ago

One idea for resolve both issue: 1) Organization don't have JEDEC manufacturer ID, 2) REL type relocation.

RELA: Encoding JEDEC manufacturer ID in addend, and 0 for those organization don't have JEDEC manufacturer ID, relocation symbol will point to a special symbol $v<vendor-name> if addend is 0.

REL: Relocation symbol will point to a special symbol: $jedec0x<jedecid-in-hex> or $v<vendor-name>.

MaelCravero commented 7 months ago

I think it would be more straightforward to only use symbols with the vendor name: this way we have a single way to use this relocation for both REL and RELA without needing to differentiate for vendors with or without a JEDEC ID. Also, if a vendor really wants to use their JEDEC ID, they are free to use it in the symbol name.

kito-cheng commented 6 months ago

I am happy with special/dummy symbol only scheme.

@jrtc27 does it address your concern?

kito-cheng commented 6 months ago

Just cc more linker experts to make sure it's not insane to linker implementation.

Some background:

We have only limited relocation can be used within the RISC-V psABI due to RV32, we already divide few portion of relocation to vendor extension, however we didn't have a mechanism to well maintain the vendor relocation, that cause those relocation never go back to upstream, therefore, some vendor extension start upstream, like Core-V, they have few vendor relocation, and I believe they are not along...SiFive has also defined some vendor relocation within downstream, also IIRC CHERRI also defined few relocation...so having a mechanism defined in psABI would be great.

@rui314 @MaskRay

jrtc27 commented 6 months ago

CHERI is less relevant here; as a custom research extension we have no need to coexist with other vendor extensions, and the standard extension we're working towards now in the CHERI SIG+TG won't be a vendor extension so will coexist just fine.

rui314 commented 6 months ago

It seems the fundamental issue here is that the r_type member is just too small in 32 bits. So, as a workaround, maybe we can require a vendor-specific relocation to be represented as a pair of relocations instead of just a single relocation? This effectively extends the vendor-specific range from 2^6 (64) to 2^14 (16384), allowing vendors to use non-overlapping relocation types.

(This idea was inspired by the concept of UTF-16 surrogates.)

jrtc27 commented 6 months ago

We cannot change the meaning of any of the existing relocations reserved for custom use. The whole point of reserving them for custom use like that is so that others can rely on being able to use them in perpetuity.

rui314 commented 6 months ago

Then how about defining [176, 191] as "vendor-specific relocation prefixes"? That allows 16*256=4096 new vendor-specific relocations.

jrtc27 commented 6 months ago

How are you encoding all vendor relocations in 12 bits?

rui314 commented 6 months ago

Actually we can just reserve a single relocation number, say, 191, as a vendor-specific prefix and use UTF-8-like encoding scheme by using following relocations to represent any number of relocation types, eliminating the need to use different relocation numbers for RV32 and RV64 for those who need more than 64 relocation types.

rui314 commented 6 months ago

Alternative approach would be to define 191 as the first relocation of a vendor-specific relocation pair, whose actual r_type is encoded in r_addend. The second relocation would contain actual data for the vendor-specific relocation (except r_type).

rui314 commented 6 months ago

So, my proposal is to resolve the issue that RV32's r_type being too small, instead of resolving conflicts in the small space by adding vendor tags. It could be achieved by adding a clause like this to the psABI:

"A relocation whose r_type is greater than 255 is represented using two consecutive relocation records in RV32. The fist relocation has r_type of 191, and the actual relocation type is encoded in its r_offset. Other members, namely r_offset, r_sym and r_addend are taken from the second relocation."

And then we can reserve a plenty space (say, 0x8000'0000 to 0xffff'ffff) for vendor specific relocations, allowing vendors to use non-overlapping regions.

How does this sound to you guys?

jrtc27 commented 6 months ago

You need r_offset to point at the right location otherwise every single tool will need to have special casing in what is normally highly generic code.

rui314 commented 6 months ago

We can use r_addend instead.

jrtc27 commented 6 months ago

We can use r_addend instead.

https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/423#pullrequestreview-1862994510

MaskRay commented 6 months ago

I think REL disallowing r_addend as an identifier is less relevant since there is no requirement to make REL work. People may not need it. I added -z rel to ld.lld and Fuchsia seems to use it, but it is pretty minor size optimization when RELR is utilized. Do we have ideas how many platform-specific relocations end up as dynamic relocations?

How urgent do people need reserved relocation types in the specification? If my RELLEB proposal https://groups.google.com/g/generic-abi/c/yb0rjw56ORw is adopted, RV32 can use effectively unlimited number of relocation types.

rui314 commented 6 months ago

@jrtc27 I agree with MaskRay on Fuchsia. Fuchsia (or anyone) reserves the right to deviate from this psABI, and that's what they are doing for dynamic relocations, but the consequence would also be on them if they choose to do so. We are not obligated to make nonstandard deviations work smoothly with new psABI features. If I were a Fuchsia developer, I'd use r_offset in the dynamic relocation table instead of r_addend (which shouldn't be hard), so I don't think that's going to be a problem, but how to implement it is up to them.

kito-cheng commented 5 months ago

@MaskRay :

Sorry for late reply, I was stuck at releasing stuffs.

Thanks you bring RELR and RELLEB(CREL), it's my first time know that, I think both are useful for code size reduction around the relocation info, I got few code size request around the kernel module (relocatable files), so I was considering to introducing mixing REL and RELA (*1), but I realized RELLEB(CREL) could do better here, anyway I am happy to see this happen, let me know if I can help anything here.

*1: AArch64 ABI allow mixing REL and RELA[1]

[1] https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#addends-and-pc-bias

Back to the the vendor relocation, I am inclined to using dummy symbol scheme at this moment, since I think new relocation format may still require more than one release cycle to implement that into all tools, however vendor instruction has merged into upstream recently, I don't want to block this behind that.

At the end, let me emphasize again, I am really looking forward to RELR and RELLEB(CREL) can improve the code size issue, I saw RELR already merged into glibc since 2.36, so I gonna to prepare some PR to mention that in the RISC-V psABI :)

MaelCravero commented 5 months ago

Updated to use symbols instead of the addend as vendor ID.

MaskRay commented 4 months ago

No objection to vendor relocations taking one type code.

However, the inclusion of a relocation symbol for vendor identification raises some questions.

Why does a vendor relocation need to identify the vendor name? A vendor identifier table could introduce maintenance overhead, as vendors might register even without immediate need, especially considering the large number of RISC-V vendors.

We reserve one code so that vendor relocations will not conflict with standard relocations. Ensuring two vendors can exchange relocatable files might not be a goal of this specification. I am also concerned with toolchain implementation overhead (I especially care about LLVM binary utilities) to recognize these vendor names.

That said, vendor identification might be useful for other purposes and we probably need a per-file mechanism. With that, do we really need a per-relocation identification?

jrtc27 commented 4 months ago

Ensuring two vendors can exchange relocatable files might not be a goal of this specification.

Unfortunately that has been expressed very explicitly as a goal, hence how we've ended up here.

kito-cheng commented 4 months ago

We reserve one code so that vendor relocations will not conflict with standard relocations. Ensuring two vendors can exchange relocatable files might not be a goal of this specification. I am also concerned with toolchain implementation overhead (I especially care about LLVM binary utilities) to recognize these vendor names.

That said, vendor identification might be useful for other purposes and we probably need a per-file mechanism. With that, do we really need a per-relocation identification?

We may have multi-version function for different vendor, so that may contain relocation from more than one vendor in theory.

MaskRay commented 4 months ago

Can you share an example of how vendor relocations from various vendors are combined? Are we comfortable committing to long-term maintenance now, or should we gain more experience first?

What actions should upstream linkers take? I think it's crucial that they avoid recognizing vendor-specific semantics.

(A dummy symbol, if ever adopted, can be implemented with STB_WEAK STV_HIDDEN so that after linking there is only one copy.)

kito-cheng commented 2 months ago

@MaskRay

Are we comfortable committing to long-term maintenance now, or should we gain more experience first?

What actions should upstream linkers take? I think it's crucial that they avoid recognizing vendor-specific semantics.

RISC-V community agree taking vendor-specific extensions, so that may introduce new relocation you could imagine :P, but require vendor to maintain that, one agreement here is that require an open spec on toolchian-convention, give some piratical example is we have vendor extension from T-head, Core-V (OpenHW), Ventana and SiFive on upstream now.

Can you share an example of how vendor relocations from various vendors are combined?

So back to this question, because we allow vendor upstream their extension, so yes they may contain within single file.

kito-cheng commented 1 month ago

@MaskRay any further concern?