riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
663 stars 158 forks source link

Remove restrictions on ULEB128 #413

Closed charlie-rivos closed 5 months ago

charlie-rivos commented 7 months ago

Following the conversation of https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/403, the restrictions imposed there should be removed.

The restrictions stemmed from the implementation details of binutils and lld, however uleb128 should not be limited to consecutive relocations. While consecutive relocations does simplify implementation, it is an arbitrary restriction.

@Nelson1225 is working on removing this restriction from GNU ld and @rui314 notes that the mold linker does not have this restriction. Since Linux 6.7-rc1, there has been a linker implementation that supports non-consecutive uleb128 relocations.

MaskRay commented 7 months ago

The restrictions stemmed from the implementation details of binutils and lld, however uleb128 should not be limited to consecutive relocations.

It's true for lld (https://github.com/llvm/llvm-project/pull/72610), without the restriction the implemention would become more complex.

While consecutive relocations does simplify implementation, it is an arbitrary restriction.

I think we need a concrete use case that this PR benefits to remove the restriction.

Can you elaborate "Since Linux 6.7-rc1, ..."?

and @rui314 notes that the mold linker does not have this restriction.

mold elf/arch-riscv.cc does not report overflows for R_RISCV_SET_ULEB128. If it implements an error, there will be some requirement from the assembler, the available space needs to sufficiently large.


The restriction is also related to composed relocations https://www.sco.com/developers/gabi/latest/ch4.reloc.html

If multiple consecutive relocation records are applied to the same relocation location (r_offset), they are composed instead of being applied independently, as described above. By consecutive, we mean that the relocation records are contiguous within a single relocation section. By composed, we mean that the standard application described above is modified as follows:

If "must be placed immediately ..." is to be removed, are you proposing that there is some use cases when another relocation of the same offset can be added between a pair of R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128? Or that you want to make the pair no longer composed relocations? I'd like to understand how removing the restriction helps the toolchain.

Or rather, there is a use case where R_RISCV_SET_ULEB128 is needed without a pairing R_RISCV_SUB_ULEB128. This is a bit

kito-cheng commented 7 months ago

Following the conversation of [1], the restrictions imposed there should be removed.

The link seems missing?

charlie-rivos commented 7 months ago

It's true for lld (https://github.com/llvm/llvm-project/pull/72610), without the restriction the implementation would become more complex.

Yes, this change is non trivial. However, as @Nelson1225 said in the last thread:

It's fine to remove those limitations from gnu toolchain, and maybe it's time to also add the overflow checks for ADD/SET/SUB/ULEB128 since they are the same issue from the implementation aspect.

Or rather, there is a use case where R_RISCV_SET_ULEB128 is needed without a pairing R_RISCV_SUB_ULEB128.

I am not sure if any application currently needs this, but it seems like an arbitrary restriction to place in the spec.

Can you elaborate "Since Linux 6.7-rc1, ..."?

I implemented this behavior in the Linux linker given the conversation on the last post (used to link kernel modules at runtime). You can see the code here. The implementation for Linux is a simpler case because there is no relaxation.

I'd like to understand how removing the restriction helps the toolchain.

This restriction was only put into place because the current implementations in the toolchains were incorrect (not conforming to the standard). However, now that there exists and implementation that properly handle uleb128 to the previous spec (in Linux) I am proposing to revert. I know I opened the previous pull request, but I felt like it was merged before the conversation had completed.

The benefit is that uleb128 would work in the same way as the other relocations. With this restriction, you cannot have multiple SUB_ULEB128 for the same offset, even though this is legal in any other sub. It is also a consistency issue with HI/LO relocations which also are not required to be a pair as you pointed out in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/408.

The link seems missing?

I fixed it.

rui314 commented 7 months ago

@MaskRay

It's true for lld (https://github.com/llvm/llvm-project/pull/72610), without the restriction the implemention would become more complex.

You can just remove the overflow check altogether. It is the complier's responsibility to reserve a sufficient number of bytes so that overflow will never happen at link time. That's what lld does for all the other relocation types of this kind, namely R_RISCV_{SET,SUB}{6,8,16,32,64}. I don't understand why you want to check for relocation overflow only in the case of R_RISCV_{SET,SUB}_ULEB128.

jrtc27 commented 7 months ago

They're not quite the same thing. The fixed-with ones are because of the file format or code model in use, so are less the compiler calculating something and more just that's what you have to have (generally). ULEB128 does require your assembler (not compiler) to correctly calculate the largest difference you could have for a given expression, rather than just use whatever size that directive says to use.

From the linker's perspective, they're kind of the same. But they are not the same when you look at the whole pipeline.

rui314 commented 7 months ago

Thanks for the info, let me s/compiler/assembler/ to my previous comment. My point still stands; the linker doesn't need to check for relocation overflow of R_RISCV_{SET,SUB}_ULEB128. You can do it if you want, but lld doesn't generally do rigorous verification of input file integrity and works mostly under a garbage-in garbage-out policy, so being very cautious only of misuse of these relocation types doesn't make much sense to me.

charlie-rivos commented 7 months ago

@rui314

You can just remove the overflow check altogether. It is the complier's responsibility to reserve a sufficient number of bytes so that overflow will never happen at link time. That's what lld does for all the other relocation types of this kind, namely RRISCV{SET,SUB}{6,8,16,32,64}. I don't understand why you want to check for relocation overflow only in the case of RRISCV{SET,SUB}_ULEB128.

I think I am misunderstanding. By "overflow" I am referring to trying to fit a value that is larger than 8/16/32/64 bits value into a 8/16/32/64 bit relocation. uleb128 cannot overflow because it is not fixed width. It just grows if it needs more space. The spec defines that the linker will have enough space to grow the uleb128.

When you do the following, GNU ld (GNU Binutils) 2.41 silently "fails" and just truncates the value to 2:

.text
.global test
test:
        lw      a0, value
        ret
.data
first:
        .space 258
second:
value:
        .reloc          value, R_RISCV_SET8, second
        .reloc          value, R_RISCV_SUB8, first
        .word           0

This is very hard to debug.

Nonetheless, this is not about overflow. This is about allowing people to use multiple uleb128 on the same offset, or a set that is not followed by a sub. That is impossible with this spec as it requires a set followed by a sub. This is valid in other set/sub/add relocations like the following:

.text
.global test
test:
        lw      a0, value
        ret
.data
start:
        .space 5
first:
        .space 10
second:
        .space 20
third:
value:
        .reloc          value, R_RISCV_SET32, third
        .reloc          value, R_RISCV_SUB32, second
        .reloc          value, R_RISCV_SUB32, first
        .reloc          value, R_RISCV_ADD32, start
        .word           0

The problem arises, however, when you get something like:

.text
.global test
test:
        lw      a0, value
        ret
.data
value:
        .reloc          value, R_RISCV_SET8, 0
        .reloc          value, R_RISCV_SUB8, 3
        .word           0

(@Nelson1225 very recently fixed a bug where constants in sub relocations were added instead of subtracted)

This is easy to handle in the fixed-length cases like above. Instead of representing this as -3, the linker subtracts 3 from 256 to arrive at 253. This allows the case where you set a value that is greater than the length and later subtract to get it in a proper range like the following. This is trivially implemented by just truncating the value that is stored. Instead of storing 258 at value, the linker can just store 2 and this math will still work.

.text
.global test
test:
        lw      a0, value
        ret
.data
first:
        .space 258
second:
value:
        .reloc          value, R_RISCV_SET8, second
        .reloc          value, R_RISCV_SUB8, first
        .reloc          value, R_RISCV_SUB8, -3
        .word           0

However, for uleb128 you cannot (should not) do the same thing. You can do that, but it would require that the compiler always allocate space for the maximum size uleb128 (9 bytes). This defeats the point of uleb128. It would always need this space because it would try to store the entire 64-bit address. In the common case that the directive .uleb128 <value>-<value2> is used and using the scheme above, it would:

  1. Generate a relocation that sets the value at the offset to value2 in uleb128 format because it doesn't know if it is the final relocation at that offset
  2. Generate a second relocation that subtracts value from the uleb128 value stored at the offset. This will likely cause the number of bytes that the uleb128 data takes up less than maximum size of 9. However, that space was allocated by the compiler and you can't get it back.

In order to solve this, we need to figure where the final relocation is at the offset and apply them in the correct order. To do this, we need a list of all of the relocations at the offset and we can just apply them in order to a buffer. We know where the final relocation is so we know when the buffer won't be modified further and at that point we write the buffer to the offset in uleb128 format.

When I am talking about "overflows" I am actually talking about anything that is not uleb128. In the quote I mentioned previously from @Nelson1225, he realized that this uleb128 algorithm I provided is the same algorithm as an overflow checking algorithm. So implementing this algorithm allows you to both do overflow checking for anything that is not uleb128 and uleb128 that does not have the consecutive relocations restriction.

MaskRay commented 7 months ago

Or rather, there is a use case where R_RISCV_SET_ULEB128 is needed without a pairing R_RISCV_SUB_ULEB128.

I am not sure if any application currently needs this, but it seems like an arbitrary restriction to place in the spec.

I think we need a use case to back up removing the restriction. It's like: if we don't have a use case in mind, we wouldn't propose a new relocation type. Imposing restrictions is useful when it can discourage unsupported uses cases and help implementations simplify code.

To use a .uleb128 to encode a non-constant int32, 5 bytes are needed. To use a .uleb128 to encode a non-constant int64, 10 bytes are needed (9 if you can be sure about the most significant bit).

For both cases it's worse than a plain old .long/.quad.

My comment about composed relocations (not replied to by others) is still relevant. The current restriction at least makes it implicit that R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128 form composed relocations and rule out invalid cases like:

.reloc ., R_RISCV_SET_ULEB128, w2
.reloc ., R_RISCV_ADD64, w2
.reloc ., R_RISCV_SUB_ULEB128, w1

The restriction is also related to composed relocations sco.com/developers/gabi/latest/ch4.reloc.html

If multiple consecutive relocation records are applied to the same relocation location (r_offset), they are composed instead of being applied independently, as described above. By consecutive, we mean that the relocation records are contiguous within a single relocation section. By composed, we mean that the standard application described above is modified as follows:

charlie-rivos commented 7 months ago

I think we need a use case to back up removing the restriction. It's like: if we don't have a use case in mind, we wouldn't propose a new relocation type.

Yeah I can understand that. I feel like making the uleb128 relocation have behavior closer to the behavior of the other relocations is sufficient enough, but it does require extra work from a linker-implementation perspective. I also believe that because the implementation of this feature can be wrapped up in better handling of set/sub/add relocations (overflow handling), it makes the case for this change stronger.

To use a .uleb128 to encode a non-constant int64, 10 bytes are needed (9 if you can be sure about the most significant bit).

I was thinking that the uleb128 relocations are only SET and SUB so there are 64 bits max and so it is guaranteed that the 9th byte is will be the last one. Maybe this is not a valid assumption.

My comment about composed relocations (not replied to by others) is still relevant.

Thanks for providing the example, I wasn't entirely clear the first time you asked. Since this can probably be considered as obviously bad input, we can add a line in the spec that says that the behavior is undefined in this case (or explicitly prohibit it).

charlie-rivos commented 7 months ago

If uleb128 set/sub are supposed to be composed, shouldn't it follow the scheme of PCREL_LO/PCREL_HI which has the LO relocation point to the HI relocation?

jrtc27 commented 7 months ago

If uleb128 set/sub are supposed to be composed, shouldn't it follow the scheme of PCREL_LO/PCREL_HI which has the LO relocation point to the HI relocation?

HI and LO split a relocation to the same thing across multiple locations. SET/SUB apply a series of different relocations to the same location. They're quite different things.

rui314 commented 7 months ago

@charlie-rivos

I don't think I understand your point as to why only ULEB128 relocations are special. For other SET/SUB relocations,

Likewise,

I don't think you need to always use 9 bytes to store an intermediate value for the same reason that we need only the least significant 8 bits to compute A-B if we know A-B<2^8. I believe the following assembly should correctly set 16382 to value.

.text
.global test
test:
        lw      a0, value
        ret
.data
first:
        .space 16385
second:
value:
        .reloc          value, R_RISCV_SET_ULEB128, second
        .reloc          value, R_RISCV_SUB_ULEB128, first
        .reloc          value, R_RISCV_SUB_ULEB128, -3
        .byte           0x80
        .byte           0
charlie-rivos commented 7 months ago

don't think I understand your point as to why only ULEB128 relocations are special

I was trying to express that in uleb128 the linker needs to know the final value before it encodes it into uleb128 format. Yes, the assembler can allocate less than 9 bytes, but it will always have to over-allocate. If it allocates 2 bytes it is possible that only 1 byte is needed. The uleb128 encoded value has changes depending on if there is 1 byte or 2 bytes because it has to set the most significant bit to 1 if there are more bytes in the stream.

   .reloc          value, R_RISCV_SET_ULEB128, second
   .reloc          value, R_RISCV_SUB_ULEB128, first
   .reloc          value, R_RISCV_SUB_ULEB128, -3

This violates the condition that uleb128 SET/SUB are consecutive. The purpose of his pull request to support something like this, however this relocation sequence is illegal by the current psABI.

Nelson1225 commented 7 months ago

@Nelson1225 is working on removing this restriction from GNU ld and @rui314 notes that the mold linker does not have this restriction. Since Linux 6.7-rc1, there has been a linker implementation that supports non-consecutive uleb128 relocations.

Sorry I just said it's probably time to do it, but not promise that I am working on it...

rui314 commented 7 months ago

@charlie-rivos

uleb128 the linker needs to know the final value before it encodes it into uleb128 format

That's an misunderstanding, isn't it? SET_ULEB128 can truncate a given value to the available width, while SUB_ULEB128 subtracts a given value from the one written by SET_ULEB128. For instance, if the location pointed to by a SET_ULEB128 is 80 80 80 00 in hex (which is 0 encoded in a redundant way), the place can hold a 28 bit value, and SET_ULEB128 will write the lowest 28 bits of a given intermediate value to the location. This approach can be used to compute a difference no larger than 2^28.

In general, you need only the least N bits as an intermediate result to compute A-B if you know A-B < 2^N.

This is how SET32/SUB32 work for a 32 bit displacement in the 64-bit address space, for example. We always truncate an intermediate result except for SET64/SUB64. I think {SET,SUB}_ULEB128 should work that way too. Am I missing something?

charlie-rivos commented 7 months ago

For instance, if the location pointed to by a SET_ULEB128 is 80 80 80 00 in hex (which is 0 encoded in a redundant way), the place can hold a 28 bit value, and SET_ULEB128 will write the lowest 28 bits of a given intermediate value to the location.

Where is it defined in the psABI that the SET_ULEB128 relocation is defined to discover the amount of space available to it by reading the values stored in data at the relocation address?

jrtc27 commented 7 months ago

For instance, if the location pointed to by a SET_ULEB128 is 80 80 80 00 in hex (which is 0 encoded in a redundant way), the place can hold a 28 bit value, and SET_ULEB128 will write the lowest 28 bits of a given intermediate value to the location.

Where is it defined in the psABI that the SET_ULEB128 relocation is defined to discover the amount of space available to it by reading the values stored in data at the relocation address?

It's not clearly written down but it's the only interpretation that makes any sense. Otherwise there's no way to know how many bytes you can use.

kito-cheng commented 7 months ago

Having that restriction can simplify the complexity of linker implementation, but I am not sure I see what we can gain from removing the restriction.

kito-cheng commented 5 months ago

Discussed in last psABI meeting, and we reach consensus to close this PR.

charlie-rivos commented 5 months ago

I am disappointed that it has been decided to keep this restriction, however I am not interested in arguing the point further at this time.

rui314 commented 5 months ago

I agree with @charlie-rivos; I don't think we have reached an agreement to keep this restriction in the psABI. Particularly so because @charlie-rivos was the person who added the restriction to the spec in the first place.