riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
708 stars 163 forks source link

Document the non-atomic relaxation pitfall #34

Open sorear opened 7 years ago

sorear commented 7 years ago

This should be in the main document somewhere but right now I'm writing this primarily for @PkmX .

Consider the following:

.option norvc
.option relax
__global_pointer$:
.skip 2028
L1:
addi a0, a0, %lo(L3)  // (D)
L2:
lui a1, %hi(L1) // (A)
addi a1, a1, %lo(L1) // (B)
lui a0, %hi(L3) // (C)
j L1
L3:

Relaxation of one instruction at a time in address order with immediate update of symbol values will break this code. When (D) is visited, L3 - __global_pointer$ is out of range for a 12-bit immediate so the addi has to be kept, but when (C) is visited that offset is in-range because (A) + (B) can be relaxed to a single instruction. Relaxing (C) then causes a0 to be not set when control flow reaches (D).

It is necessary for all linked %hi / %lo pairs to make a consistent decision regarding whether the relaxation should be performed; since there's no metadata explicitly linking the pairs, the easiest way to arrange this is with a two-pass algorithm:

  1. Visit each relaxable relocation, determine if it can be relaxed, and perform byte modifications but do not change any symbol values. The current binutils implementation uses R_RISCV_DELETE pseudo-relocations to record bytes which need to be deleted in the next phase; since it is never written into object files it is not an actual ABI relocation.

  2. Perform all pending byte deletions.

As an additional benefit, the two-pass algorithm is linear time and somewhat parallelizable.

PkmX commented 7 years ago

Yes, I can see how this could be a potential pitfall, but is there a better example? I believe ld currently doesn't relax %hi/%lo pairs into gprel for symbols in .text.

Would it make sense for %lo to link with its corresponding %hi à la %pcrel_lo? That should make it obvious that the pair needs to be relaxed together.

sorear commented 7 years ago

I think that if you don't perform gprel relaxations for anything outside the data segment and don't have relaxations of any kind inside the data segment, that's probably good enough (R_RISCV_CALL is already atomic).

Threading the %lo came up as a possibility during the binutils upstreaming; we decided it was too much complication for very little benefit.

palmer-dabbelt commented 7 years ago

The plan is to move to a two-pass, R_RISCV_DELETE-based algorithm for all the relaxations, which is why those patches are still WIP (I want to make sure it can be used for something else before submitting). We're just time limited right now.

MaskRay commented 3 years ago

Can a gcc contributor advice whether -mno-relax can affect code generation? This is a controversial point in the Clang patch (if -mno-relax, add .option norelax?): https://reviews.llvm.org/D102535

Does the code generator reserve rights to make -mno-relax change code generation behavior?

MaskRay commented 3 years ago

@jim-wilson

jim-wilson commented 3 years ago

-mno-relax doesn't affect code generation. As you mentioned, gcc -S -mno-relax tmp.c; gcc -c tmp.s should work which is why we need the .option norelax.

jrtc27 commented 3 years ago

gcc -S -mabi=foo tmp.c; gcc -c tmp.s doesn't work, why should it for -mno-relax? If you use different flags to assemble than what you generated the code with that's on you and you should fix your broken build system.

jrtc27 commented 3 years ago

Also, who on earth is invoking the assembler separately? Sure, we can include .option norelax, but it just masks the compiler driver not forwarding -mno-relax on to the assembler (which used to be the case for GCC, I don't know if it finally got fixed: this actually meant gcc -c -mno-relax tmp.s didn't do anything for a very long time and you had to pass -Wa,-mno-relax anyway to assemble files without relaxation support).

jim-wilson commented 3 years ago

This is a pointless argument. I'm not changing how the GNU toolchain works.

jrtc27 commented 3 years ago

It's not pointless. https://github.com/riscv/riscv-c-api-doc documents various compiler flags, and we need to have a common interface that can be relied upon from GCC and Clang (and Mark will tell you that too). Both are free to offer stronger semantics and extra options than the other, but we need to have compatibility between the two, otherwise we are doing users a disservice. You maintain GCC, we maintain LLVM, and so together we somehow need to reach an agreement over what is the best approach. The "this is what GCC does, that's what you must do" is not that; the two compilers should be on an equal footing, and arguably LLVM is even more important these days due to its ability to be reused as the backend for other projects, though GCC remains the dominant compiler in the Linux world for now.

jim-wilson commented 3 years ago

GCC works the way it does because it evolved over 10 years or so. I can't go back in time and change history. You just need to accept that some things don't work the way that they do if we had designed it from scratch knowing what we know now. The .option norelax is one of those. There is really no reason to waste a large amount of time arguing about this. It is a completely trivial issue. It is just one line emitted by the compiler into the assembly language file. This isn't worth an argument, or a possibly backward compatibility breaking change. Just accept it for what it is and move forward.

jrtc27 commented 3 years ago

I'm not asking for GCC to change, I'm just saying Clang may well not follow GCC in permitting such lax programming practices, provided there's not a good argument against it besides "GCC lets you do $stupid_thing".

jrtc27 commented 3 years ago

(and this query was not about "GCC should change", it was "is our understanding of the flag, and how it could possibly be used by GCC, correct?")

jim-wilson commented 3 years ago

I don't see how anything can be misunderstood here. You already have a clang patch that correctly implements this, you are just refusing to accept it.

If the user specifies -mno-relax, then we put .option norelax in the assembly file. Unlike an ABI, relaxation can be turned off and on at will in the middle of an assemblly file, and this is actually a necessary feature to make relaxation work properly, as the startfile initializing gp needs to temporarily turn relaxation off. So if you have both a command line option and a .option relax/norelax in the assembly file, then the one in the assembly file takes precedence. This is all trivial to understand.

A feature should be judged on its merits, not theoretical discussions about the "proper" way to write code. For this feature, it is two trivial lines of code in the compiler, so the cost of keeping the feature is effectively zero. If we drop it, then there is the possibilty that some user code will break as this is a non-backward compatible change. Even worse than that, code will be silently compiled wrong which is the absolute worst possible mistake that a compiler can make. Therefore there is nothing to debate here. The feature must be kept.

Talking about lax programming practices is inappropriate here. We support many different kinds of programming, embedded and linux. We support may different kinds of build systems. Assuming that there is only one correct way to write code is wrong. Also, it is wrong to assume that people have perfect code, perfect build system, and infinite amounts of time to fix issues with their code. In the real world, there are always things that are non optimal, and the toolchain should be willing to make things easier for programmers, rather than add unnecessary obstacles in their way.

I see no reason why we should even be debating this feature here. Assembly language formats are always ugly, and often have minor incompatiibilities across compilers. clang goes directly from C to object files, so the assembly language format shouldn't even matter to clang except in uncommon cases.

There are real substantive issues that the psABI committee should be working on. But instead we are wasting time arguing about this trivial and unimportant feature that can't be changed now. We all have limited time to spend dealing with psABI issues. We should be using that time to discuss important issues. We should not be wasting time with trivial unimportant issues that aren't even broken.

It seems the only reason we are discussing this is because you are unable to compromise in any situation no matter how trivial or unimportant. The ability to compromise is a very important engineering skill. This is a skill you need to learn. And you can start by accepting the clang patch you already have.

jrtc27 commented 3 years ago

I don't see how anything can be misunderstood here. You already have a clang patch that correctly implements this, you are just refusing to accept it.

If the user specifies -mno-relax, then we put .option norelax in the assembly file. Unlike an ABI, relaxation can be turned off and on at will in the middle of an assemblly file, and this is actually a necessary feature to make relaxation work properly, as the startfile initializing gp needs to temporarily turn relaxation off. So if you have both a command line option and a .option relax/norelax in the assembly file, then the one in the assembly file takes precedence. This is all trivial to understand.

A feature should be judged on its merits, not theoretical discussions about the "proper" way to write code. For this feature, it is two trivial lines of code in the compiler, so the cost of keeping the feature is effectively zero. If we drop it, then there is the possibilty that some user code will break as this is a non-backward compatible change. Even worse than that, code will be silently compiled wrong which is the absolute worst possible mistake that a compiler can make. Therefore there is nothing to debate here. The feature must be kept.

Talking about lax programming practices is inappropriate here. We support many different kinds of programming, embedded and linux. We support may different kinds of build systems. Assuming that there is only one correct way to write code is wrong. Also, it is wrong to assume that people have perfect code, perfect build system, and infinite amounts of time to fix issues with their code. In the real world, there are always things that are non optimal, and the toolchain should be willing to make things easier for programmers, rather than add unnecessary obstacles in their way.

I see no reason why we should even be debating this feature here. Assembly language formats are always ugly, and often have minor incompatiibilities across compilers. clang goes directly from C to object files, so the assembly language format shouldn't even matter to clang except in uncommon cases.

There are real substantive issues that the psABI committee should be working on. But instead we are wasting time arguing about this trivial and unimportant feature that can't be changed now. We all have limited time to spend dealing with psABI issues. We should be using that time to discuss important issues. We should not be wasting time with trivial unimportant issues that aren't even broken.

It seems the only reason we are discussing this is because you are unable to compromise in any situation no matter how trivial or unimportant. The ability to compromise is a very important engineering skill. This is a skill you need to learn. And you can start by accepting the clang patch you already have.

Personal attacks and telling Clang/LLVM maintainers what patches to accept are both entirely inappropriate. Please refrain from making such comments ever again.

aswaterman commented 3 years ago

Personal attacks and telling Clang/LLVM maintainers what patches to accept are both entirely inappropriate. Please refrain from making such comments ever again.

Obviously, your "GCC lets you do $stupid_thing" remark falls into one or both of those buckets. You need to take it down a notch, too.