riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
691 stars 163 forks source link

Define contrainsts for ULEB128 #403

Closed charlie-rivos closed 11 months ago

charlie-rivos commented 12 months ago

Binutils [1] and a currently in-flight LLD patch [2] define ULEB128 such that a SUB_ULEB128 must be preceded by a matching SET_ULEB128. This means that if the first SET_ULEB128 that comes before a given SUB_ULEB128 does not have the same relocation address, the linkers will fail. A SET_ULEB128 also cannot appear alone.

Define this behavior in psABI such that libraries that read relocations are able to make this same assumption.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-riscv.c;h=09aa7be225ef3cba23fefd2ee977b3556a51368c;hb=HEAD#l2506 [2] https://reviews.llvm.org/D142880

kito-cheng commented 12 months ago

Just add few more linker expert to view, but will commit this Friday IF no further comments.

rui314 commented 12 months ago

I don't think they have to be consecutive pair. We probably should say that two relocations that refer to the same location are processed in-order. That means if a SET_ULEB128 appears before a SUB_ULEB128 in the relocation table, we'll get the desired outcome.

MaskRay commented 12 months ago

For SUB_ULEB128, I think it is fine to say that it must be paired with a SET_ULEB128 immediately before. For SET_ULEB128, technically we can allow flexibility: it is fine to say that it can be used independently or paired with a subsequent SUB_ULEB128. In reality, a lone SET_ULEB128 seems not useful and we probably don't want to recommend its use.

rui314 commented 11 months ago

Can anybody shed light on why GNU ld and lld require them to appear as a pair? In mold, we don't have such restrictions, and it seems they are working just fine; SET_ULEB128 writes a value, and SUB_ULEB128 reads the value, subtracts another value from it and then writes back the result.

rui314 commented 11 months ago

It looks like that's how lld handles SET6 and SUB6 relocations; lld only requires SET6 precedes SUB6 if they refer to the same location, and they don't have to be consecutive. I wonder why lld can't do the same for SET_ULEB128 and SUB_ULEB128.

jrtc27 commented 11 months ago

Can anybody shed light on why GNU ld and lld require them to appear as a pair? In mold, we don't have such restrictions, and it seems they are working just fine; SET_ULEB128 writes a value, and SUB_ULEB128 reads the value, subtracts another value from it and then writes back the result.

There’s only enough space guaranteed for the final result, but the SET might need more bytes.

charlie-rivos commented 11 months ago

I don't think they have to be consecutive pair. - rui314

For SUB_ULEB128, I think it is fine to say that it must be paired with a SET_ULEB128 immediately before. - MaskRay

I don't believe the wording I have proposed specifies that the SET must come immediately before.

Must be paired with the first SET_ULEB128 that comes before this

This wording I used was meant to be interpreted to mean that the SET can come anywhere before the SUB in the list of produced relocations. Should the wording be updated to enforce that they must be consecutive?

rui314 commented 11 months ago

There’s only enough space guaranteed for the final result, but the SET might need more bytes.

That's not limited to SETULEB128. We have {SET,SUB}{6,8,16,32,64,ULEB128}. What's special about ULEB128?

kito-cheng commented 11 months ago

I would say that's kind of implementation limitation, and this limitation is make implementation easier, since we must handle two relocation together, we can write more complicated logic to handle that right but seems not gain to much rather than give it few more constraint.

But why no similar issue for {SET,SUB}_{6,8,16,32,64}? because they are fixed length, and based on the general exception of linker relaxation is the symbol difference can be only shorter, so NO overflow is expected, then that could be handle separately is fine.

rui314 commented 11 months ago

Thank you @kito-cheng, so here is my understanding. Can you confirm if we're on the same page?

  1. unlike other SET/SUB relocations, we can't determine the distance between two symbols for SET_ULEB128/SUB_ULEB128,
  2. we want to report an error if the calculated value doesn't fit, and
  3. mandating SET_ULEB128/SUB_ULEB128 to appear together as a pair makes the diagnostics easier.

If that's the case, I think mandating them to appear as a pair makes sense. (I wasn't against the idea; I was just trying to understand the rationale behind it.)

charlie-rivos commented 11 months ago

@rui314 I believe your understanding is correct

rui314 commented 11 months ago

Thanks for the clarification! Then, perhaps we should specify that the two relocations must appear consecutively? By doing so, we can essentially treat the two relocations as one, which I think would greatly simplify the implementation.

charlie-rivos commented 11 months ago

That is fine with me, I can change the wording if other people are okay with that @kito-cheng

kito-cheng commented 11 months ago

Sorry for the late reply, I was out of office after attending LLVM dev conf, and back now :)

I think it's OK to constrain that more stricter: 1) Current assembler implementations are satisfied that constraint, 2) Current and proposed linker implementations are work well with that, 3) easier handling.

@rui314 thanks for the suggestion!

rui314 commented 11 months ago

LGTM

rui314 commented 11 months ago

Related question: since this is unsigned LEB, the result of a SET minus a SUB should be positive, but how does the compiler guarantee it if it doesn't know the final layout? What exactly is the use case of this relocation?

jrtc27 commented 11 months ago

DWARF call sites are encoded as an address or unsigned offset, typically ULEB128 (unless not present, in which case they're an absolute 0 address). Toolchains previously had to turn off using ULEB128 for RISC-V and use less efficient encodings instead, but this allows them to make RISC-V look more normal and use ULEB128 again.

rui314 commented 11 months ago

If a pair of SET and SUB relocations are used to calculate an unsigned offset, wouldn't the compiler already know the maximum possible offset? I'm wondering if we can trigger a relocation overflow error for the pair without it being a compiler bug.

jrtc27 commented 11 months ago

If a pair of SET and SUB relocations are used to calculate an unsigned offset, wouldn't the compiler already know the maximum possible offset?

Yes, hence how it's possible to know how much to pad the .uleb128 by to ensure there are enough bytes to avoid overflow. Otherwise it just wouldn't work.

rui314 commented 11 months ago

Ah, if that's the case, we don't need to check for overflow in the linker specifically for ULEB128 relocations, do we? While detecting a compiler bug early on is beneficial, the psABI spec shouldn't require the linker to do that. Instead, it should be an optional measure the linker can adopt for all SET/SUB relocation pairs, not just ULEB but other types as well, if they chose to verify the input rigorously. I believe we should revert this change.

MaskRay commented 11 months ago

The linker should report errors if the length of ULEB128 byte sequence is more extended than the current byte sequence.

s/should/can/?

This patch is correct and adds more constraints to producers: "Must be placed immediately before a SUB_ULEB128 with the same offset. Local label assignment <<uleb128-note,*note>>" Relocation overflow checking is an optional feature that linker are not obliged to implement. Personally I recommend implementing it.

rui314 commented 11 months ago

Then why stop here and impose this restriction only for ULEB? If it's recommended to find a compiler bug, it looks like it is hard to justify doing it only for ULEB and not for {SET,SUB}{8,16,32,64} relocations.

kito-cheng commented 11 months ago

More context for the demand of ULEB128: dwarf5 has add few new records are only available with ULEB128 encoding, unlike other record can using alternative encoding.

The linker should report errors if the length of ULEB128 byte sequence is more extended than the current byte sequence.

Both LLD (proposed patch) and ld.bfd has implement the check, will report error and abort.

Then why stop here and impose this restriction only for ULEB? If it's recommended to find a compiler bug, it looks like it is hard to justify doing it only for ULEB and not for {SET,SUB}{8,16,32,64} relocations.

Yeah, we may also implement overflow check for {SET,SUB}{8,16,32,64} too, generally it means something wrong when overflow happened.

rui314 commented 11 months ago

Yeah, checking for overflow in the linker if SET and SUB are consecutive in the relocation table is perhaps a good practice to detect compiler bugs. However, isn't it odd that the psABI mandates the linker to do it in this specific case? I believe the confusion arises from the misconception that only {SET, SUB}_ULEB128 can overflow in a valid use case. This isn't true unless the compiler is buggy.

charlie-rivos commented 11 months ago

The reason I was interested in this was that I noticed that LLVM and binutils both implemented this constrained version and I am writing a library that consumes relocations that would like to use the same assumption. The library is module loading in the Linux kernel: https://lore.kernel.org/linux-mm/20231019-module_relocations-v6-0-94726e644321@rivosinc.com/T/. It is preferable to have the same behavior of the module linker as a user would expect from using binutils or LLVM. However, without this psABI requirement, it would not be valid to implement this restriction and the behavior could be different.

rui314 commented 11 months ago

@charlie-rivos It turned out that the relocation overflow should never occur because the compiler knows the maximum length of a ULEB value when they create object files. So the exact behavior would be the same if you handle {SET,SUB}_ULEB128 in the same way as {SET,SUB}_{8,16,32,64}. It seems to me that the original GNU ld code was written with a misconception that only the ULEB128 relocations could overflow, and the behavior was copied to lld without further investigation. I don't think you need to mimic that behavior; it looks like it'd just complicate your logic. I do see it as an opportunity to simplify your patch.

For the mold linker, I'll keep handling {SET,SUB}_ULEB128 in the same way as we do for the other SET/SUB relocations.

@MaskRay Maybe you want to simplify the logic for lld too? These relocations can be handled individually just like other SET/SUB relocations. You might want to keep the overflow check as an assertion that works only when SET and SUB happen to be consecutive, but if you do that, you probably should do for all SET/SUB relocation paris (or don't do it at all for all SET/SUB).

MaskRay commented 11 months ago

Yeah, checking for overflow in the linker if SET and SUB are consecutive in the relocation table is perhaps a good practice to detect compiler bugs.

Agree.

However, isn't it odd that the psABI mandates the linker to do it in this specific case? I believe the confusion arises from the misconception that only {SET, SUB}_ULEB128 can overflow in a valid use case. This isn't true unless the compiler is buggy.

Agree. My mindset is: unless the wording says that no-checking is performed (like AArch64 *_NC), I assume that the relocation type needs checking. The text should probably state that when R_RISCV_SET16/R_RISCV_SUB16 is used as a composed sequence (i.e. multiple consecutive relocation records are applied to the same relocation location), the difference is checked to be within the range [0,65535], instead of requiring the two S+A values to be within the range.

@charlie-rivos It turned out that the relocation overflow should never occur because the compiler knows the maximum length of a ULEB value when they create object files. So the exact behavior would be the same if you handle {SET,SUB}ULEB128 in the same way as {SET,SUB}{8,16,32,64}. It seems to me that the original GNU ld code was written with a misconception that only the ULEB128 relocations could overflow, and the behavior was copied to lld without further investigation. I don't think you need to mimic that behavior; it looks like it'd just complicate your logic. I do see it as an opportunity to simplify your patch.

lld/ELF/Arch/RISCV.cpp just doesn't check overflows for R_RISCV_{ADD,SET,SUB}* :) I think this is reasonable as the semantics are still unclear and the return on investment isn't significant enough.

@MaskRay Maybe you want to simplify the logic for lld too? These relocations can be handled individually just like other SET/SUB relocations. You might want to keep the overflow check as an assertion that works only when SET and SUB happen to be consecutive, but if you do that, you probably should do for all SET/SUB relocation paris (or don't do it at all for all SET/SUB).

lld hasn't implement R_RISCV_SET_ULEB128 yet. I am still waiting on the reviews for the assembler patch https://reviews.llvm.org/D157657

Nelson1225 commented 11 months ago

lld/ELF/Arch/RISCV.cpp just doesn't check overflows for RRISCV{ADD,SET,SUB}* :)

LD also doesn't check overflows for ADD/SET/SUB. Currently we write the value back to the data filed immediately when handing each ADD/SET/SUB. I tried to add the overflow checks for each of them, but met troubles since gcc may generate s ADD32/SET32/SUB32 under medany, which means the symbols may be 64-bit and will break the overflow checking every time, but the final values will fit into 32-bits so these should be legal. A solution is that we should have an internal structures, including linked list or hash tables, to record the changes of values for each data field, and then write back the final values and check overflows until reviewed every ADD/SET/SUB relocations.

However, isn't it odd that the psABI mandates the linker to do it in this specific case? I believe the confusion arises from the misconception that only {SET, SUB}_ULEB128 can overflow in a valid use case. This isn't true unless the compiler is buggy.

Agreed. Since we haven't supported the overflow checks for ADD/SET/SUB, so the easier and fastest way is to add the limitations to make SET_ULEB128/SUB_ULEB128 relocations work. It's fine to remove those limitations from gnu toolchain, and maybe it's time to also add the overflow checks for ADD/SET/SUB/ULEB128 since they are the same issue from the implementation aspect.