riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
702 stars 163 forks source link

Inadequate and incorrect relocation documentation #163

Open PaulLyndonCurtis opened 3 years ago

PaulLyndonCurtis commented 3 years ago

Hi,

In https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#procedure-calls the documentation indicates that the relocation R_RISCV_RELAX is against a symbol, which is incorrect. Relocation is against the null symbol, or the symbol is ignored according to the table above.

In https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#relocations the documentation has a relocation R_RISCV_ALIGN yet there is no semantic description of this relocation when relaxation is supposed to happen.

If these could be correctly documented, it would be most excellent.

Forgive me if this is not the right place, but I need to correctly work with RISC-V ELF modules in order to correctly relocate during linking.

jim-wilson commented 3 years ago

For R_RISCV_ALIGN, in the GNU toolchain, if linker relaxation is enabled, then the assembler emits the maximum possible number of alignment bytes that might be required and adds a R_RISCV_ALIGN reloc to the first nop that lists the total number of bytes of nops emitted. For example, for an alignment of 16 bytes, we emit 14 bytes of nops if rvc, and 12 bytes of nops if norvc. During linker relaxation, GNU ld handles all relaxations except R_RISCV_ALIGN first. And then handles the R_RISCV_ALIGN separately deleting as many excess nops as necessary to get the desired alignment, where the desired alignment is the arg to R_RISCV_ALIGN rounded up to the next multiple of 2. Since code size can only decrease at this point, no other relaxations can be broken by this. R_RISCV_ALIGN relaxation must be done last, as doing other relaxations after R_RISCV_ALIGN would change alignment giving the wrong result.

R_RISCV_RELAX indicates that the relaxation immediately before it is relaxable. The other fields in a R_RISCV_RELAX reloc are irrelevant. GNU as doesn't set them, and GNU ld doesn't look at them.

Would you like to volunteer to write a patch? The poor state of the docs is mostly due to lack of volunteers to work on them.

PaulLyndonCurtis commented 3 years ago

Hi Jim,

For example, for an alignment of 16 bytes, we emit 14 bytes of nops if rvc, and 12 bytes of nops if not rvc [corrected]

Hmm, then if an .p2align 4 is followed by 8 additional hand-coded non-compact nops, do the hand-coded NOPs get vacuumed when processing the R_RISCV_ALIGN? I don't see how this can be prevented in the absence of additional information in the R_RISCV_ALIGN relocation? If the run length of NOPs defines the alignment, I mean, hmm.

R_RISCV_RELAX indicates that the relaxation immediately before it is relaxable

You mean the relocation (not the relaxation) before it, surely? The documentation states this is valid for the instruction pair that is a R_RISCV_CALL or R_RISCV_CALL_PLT.

However, R_RISCV_RELAX is not only applied to R_RISCV_CALL... relocations, I see it applied as follows:

  Section .text.atan in floatops.o (x-libc.a)
      0001c794  R_RISCV_CALL  target sym=__riscv_save_2
      0001c794  R_RISCV_RELAX
      0001c7bc  R_RISCV_HI20   target sym=.LC8
      0001c7bc  R_RISCV_RELAX
      0001c7c0  R_RISCV_LO12_I target sym=.LC8
      0001c7c0  R_RISCV_RELAX

We're missing some additional information here: because an instruction pair can be split, there seems to be a RELAX applied to each part of the address-forming instruction pair. This would make the description of R_RISCV_RELAX out of date (not just CALL and not one per instruction pair).

Would you like to volunteer to write a patch? The poor state of the docs is mostly due to lack of volunteers to work on them.

That would necessitate me understanding the relocations in totality. I'm not there yet and clearly there are more nuanced descriptions of the relocations that are to be teased out of the toolset.

As an aside, from what I see, it is impossible to write a linker for RV32 ELF output that simply combines the input modules. The design of the R_RISCV_ALIGN relocation provides the impossibility because the non-linked object module does not have instructions correctly aligned after assembly (as the number of NOPs is dependent on the argument to .align only, and not on both the location counter and the argument). Therefore, a conforming linker must have enough intelligence to move code to even "link" correctly. This is the first time I've seen this abomination in object modules. :-(

PaulLyndonCurtis commented 3 years ago

On further investigation is seems the number of bytes of NOP-stream inserted for a R_RISCV_ALIGN happens to be encoded in the r_addend part of the RELA relocation.

Out of curiosity, am I being sane here? Or is this just good fortune?

jim-wilson commented 3 years ago

On a non-RISC-V target, if you have a .align followed by nops, then nothing happens to the extra nops, they still appear in the output. The exact same thing happens on RISC-V.

I did mention earlier that the number of bytes of nops is part of the R_RISCV_ALIGN reloc.

You can choose to compile without relaxation. If you do that, then you can just combine input modules.

RISC-V is not the only GNU target that supports linker relaxation. This is not an abomination, but rather a common GNU toolchain feature. It only gets complicated because the RISC-V port is trying to handle align directives correctly which some other targets get wrong. RISC-V also gets a number of other things right that other targets get wrong, like updating the function size after relaxation, and handling --wrap symbols correctly.

Also note that alignment is usually a quality of implementation issue not a correctness issue. Get function or loop alignment wrong and they will still work, just a little slower than if they had the requested alignment.

It is R_RISCV_CALL that applies to a pair of instructions. I don't think that it says anywhere that R_RISCV_RELAX can only be used with R_RISCV_CALL. It just happens that we have an example of R_RISCV_RELAX being used with R_RISCV_CALL. The description of R_RISCV_RELAX is wrong to mention a pair of instructions. I think that is a bug in one of the patches applied over the summer to try to improve the docs.

At which point I have to ask again for vounteers to help maintain the documentation, as I can't do everything myself.

PaulLyndonCurtis commented 3 years ago

Hi Jim,

On a non-RISC-V target, if you have a .align followed by nops, then nothing happens to the extra nops, they still appear in the output. The exact same thing happens on RISC-V.

For sure.

I did mention earlier that the number of bytes of nops is part of the R_RISCV_ALIGN reloc.

Indeed, but without technical details. I took "lists" to mean "just count the nops after". The specification could document that the number of bytes of nops happens to be found in the r_addend field because it's not immediately obvious.

RISC-V is not the only GNU target that supports linker relaxation. This is not an abomination,

No, you misunderstand. The abomination is that the assembler does not align correctly in object output. Sure, it lays down some NOPs, but not to ensure correct unlinked alignment. I am well aware of relaxation, having had to deal with it for the transputer/ST20 many moons ago.

You can choose to compile without relaxation. If you do that, then you can just combine input modules.

In general this would be correct. But in the case of alignment directives, your statement is incorrect. Consider this input to the assembler:

start:
ebreak
.p2align 3
ebreak
.p2align 4
ebreak

Here I have specific requirements for alignment. The section header lists the section alignment as 2^4, which is good. And therefore 16 divides the value of "start", assuming this is first in the section.

The first ebreak will be on an address that is a multiple of 16. The second should be on an address that is a multiple of 8. And the third should be on an address that is a multiple of 16. Unfortunately the output written by the assembler does not maintain correct alignment for this case, deferring alignment to the linker through a relocation:

0x00000000  00100073     EBREAK
0x00000004  00000013    NOP
0x00000008  00100073     EBREAK
0x0000000C  00000013    NOP
0x00000010  00000013     NOP
0x00000014  00000013     NOP
0x00000018  00100073     EBREAK

In the unlinked object module, the final EBREAK is not aligned on an address that is a multiple of 16. The only way this can be honored in linked output is that the linker must process the R_RISCV_ALIGN and rewrite the instruction stream. This is the abomination. No other toolset I know of has this misalignment issue in unlinked ELF modules. If it doesn't rewrite the instruction stream, the third EBREAK will not be on an address that is a multiple of 16.

The R_RISCV_ALIGN relocation now reflects the implementation of relaxation in the GNU toolset, which is not a great design decision, i.e. that things can only shrink. Other toolsets do not make this assumption, they happen to widen things that are too narrow. Clearly, a R_RISCV_ALIGN relocation could simply point to the instruction that requires alignment, pass the alignment required in the r_addend field, and not insert NOPs itself, letting the linker insert NOPs. This would be cleaner. But we mix the specification of alignment with the implementation of a specific linker. It is what it is, a crock in my opinion.

I fail to see how the RV32 toolset is "right" in this case without full-blown processing of alignment relocations and all other fixups required along the way.

RISC-V also gets a number of other things right that other targets get wrong, like updating the function size after relaxation, and handling --wrap symbols correctly.

I am not asking for justification or "my toolset is better than your toolset" takedowns. I merely point out that these relocations are not well specified, I don't know anything beyond what is written here. I guess if I want to know what this does, then essentially I need to dig into the GNU sources.

At which point I have to ask again for vounteers to help maintain the documentation, as I can't do everything myself.

Then this should be a Foundation issue, rather like Arm producting the base specifications for toolsets and platforms, and it should be funded by the Foundation. I am not the right person to maintain this as I have no insight into the nuances of these relocations. I am sure there will be more issues along the way.

jim-wilson commented 3 years ago

As I already sad, you can compile without relaxation. See the -mno-relax option. The assembler does what you want in this case. With relaxation, it is unavoidable that align will cause problems. I think you are trying to lay blame in the wrong places.

PaulLyndonCurtis commented 3 years ago

OK, I concede that I am laying the blame incorrectly for R_RISCV_RELAX. I need to tell the toolset not to enable relaxation for a first-cut linker. Still, a better design of the R_RISCV_RELAX relocation would enable a module assembled with relaxation to be linked with a relaxation-unaware linker. The implementation of the GNU linker protrudes into the specification of the RISC-V ELF format, which is not great.

Anyway, I now have things linking and running, with relaxation enabled (and not using any untoward alignment directives in the assembly language files).

Like I said, I'm sure there wil be other things as I make my way through supporting RISC-V for linking.

Best,

-- Paul.

PaulLyndonCurtis commented 3 years ago

OK, so now I see that R_RISCV_SET6 is being used to relocate Dwarf debug sections, but there's no description of what is required beyond it affects a "word6". I can guess R_RISCV_SET8, R_RISCV_SET16, and so on, but the 6-bit relocation is just not well defined. Well, it's not defined beyond computing the relocated value, i.e. what happens where the relocation points to.

Any clues here?

PaulLyndonCurtis commented 3 years ago

Ahh, OK, I'm blind. I can see this is defined a bit lower down. But it doesn't specify overflow behaviour and whether the relocated value is signed or unsigned for this check. I guess it's modulo 2^6 and no checks performed. The ARM specification actually defines whether something is checked (or not) for overflow by using specific ..._NC relocations.

jim-wilson commented 3 years ago

SET6 is always used in a pair with SUB6, and they are used for DW_CFA_advance_loc. DW_CFA_advance_loc is used to advance the location counter in the unwind info. So this is always a subtraction of two text labels. DW_CFA_advance_loc uses the two upper bits to indicate whether the arg is in the same byte, or in the following 1, 2, or 4 bytes. The SET6/SUB6 relocs are used when the arg is in the same byte in the lower 6 bits. SET6 is for the upper address. And then SUB6 is for the lower address to be subtracted. The assembler will only choose this option when it can tell at assembly time that the value will fit. And since relaxation can only decrease code size, we are guaranteed that if the value fits at assembly time, then there can be no overflow at link time. At least in theory. I haven't seen any such problems.

The same is true of the other SET/SUB 8/16/32 relocs which are used for the other DW_CFA_advance_loc variants.

These relocs are only needed when relaxation is enabled, but gas emits them always. This is one of the thousands of items on my todo list.

PaulLyndonCurtis commented 3 years ago

OK, so this makes sense, there should be no check and the value to that field is the relocated value modulo 2^64. Looks good to me, thanks.

I happen to be linking code that has relaxation enabled as I have prototyped the relaxation code. This linker is doing better than the GNU linker as it can partition the call graph to find cliques and lay them out to improve code locality. The standard SDK libraries destroy code locality by separately compiling all the math code, so the GNU linker doesn't move code around to best reduce spand-dependent jumps, it just seems to lay it out based on the order of objects on the command line and the order of objects in an archive. Or so it seems. You can restore this locality by finding those cliques and linking with them in mind.