riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
718 stars 165 forks source link

Document position on zero-page optimization for linker in psABI #197

Open compnerd opened 3 years ago

compnerd commented 3 years ago

The binutils linker currently performs a zero-page optimization (near zero address relocations using R_RISCV_PCREL_HI20, R_RIRSCV_PCREL_LO2{I,S} are rewritten to a lui + addi form. lld does not perform this rewriting and will reject input which expects this.

compnerd commented 3 years ago

CC: @luismarques

jrtc27 commented 3 years ago

My view on this is split into two parts:

  1. If the symbol is weak undef, then my opinion in #126 applies that this is a CodeGen bug. Possibly-undef weak symbols in medany code should be referenced using GOT_PCREL_HI20+PCREL_LO12 so that if they are undef they can load a constant 0 from memory. Linkers can optionally choose to optimise this sequence a la X86 GOTPCRELX to not be indirect, but that's not required for functional correctness.
  2. If the symbol is anything else, the generated code was wrong. Linkers may choose to support this, but that is a toolchain-specific extension and should not be relied upon, since such symbols violate the premise of the medany code model.
jim-wilson commented 3 years ago

There is a philosophical divide here. On the GNU toolchain side, linker relaxation is considered an important feature of the toolchain for hitting code size and performance goals. Thus this handling of weak undef is just another relaxation. On the LLVM toolchain side, linker relaxation is not implemented, and it is considered wrong for the linker to ever change an opcode emitted by the compiler. Thus this handling of weak undef is considered a compiler bug.

Another concern here is that weak symbols were originally invented for shared libraries, but for ~30 years the GNU toolchain has had an extension allowing weak symbols to be used in statically linked code, and this extension is frequently used by embedded programmers. There is a weak symbol in the newlib crt0.S for instance. And the SiFive freedom-e-sdk heavily uses weak symbols, though I'm hoping that this someday gets fixed in a rewrite. So any solution that increases code size and/or reduces performance for the handling of weak under symbols would be a major problem for embedded users and hardware companies that live and die by their benchmark numbers.

On the linux side we have the problem that most distros are being compiled by gcc, so if the C library is gcc compiled, and contains undef weak, then programs can not be linked using lld. This problem needs to be fixed, and is probably why this issue was raised.

if we had separate embedded and unix ABIs, then perhaps we could just do it the LLVM way for linux and the GNU way for embedded. But since we don't have separate embedded/unix ABIs, switching gcc to the llvm scheme would penalize embedded users without new relocations and relaxations, which means we like need a fair amount of binutils/gcc work to be done before we can switch, and not clear who will do the work as RISC-V International doesn't have any staff that can work on such problems.

Both -mcmodel=medany and -fpic are affected. But for embedded users the -fpic support doesn't matter, and for linux users with PIE by default medany maybe doesn't matter. So maybe something that we can do in the short term is leave medany support alone for now, and change the PIC support to indirect through the GOT. That only fixes half the problem, but might be enough to solve the immediate linux distro problems. Then we can handle the harder medany issue as a longer term issue.

jrtc27 commented 3 years ago

There is a philosophical divide here. On the GNU toolchain side, linker relaxation is considered an important feature of the toolchain for hitting code size and performance goals. Thus this handling of weak undef is just another relaxation. On the LLVM toolchain side, linker relaxation is not implemented, and it is considered wrong for the linker to ever change an opcode emitted by the compiler. Thus this handling of weak undef is considered a compiler bug.

This isn't true. LLD is quite happy to perform various linker relaxations (we do TLS relocations for a number of architectures, and things like GOTPCRELX on amd64 to rewrite GOT accesses to RIP-relative accesses), but they are strictly optimisations and not required to be implemented. That means all code generated by the compiler should be able to be linked without rewriting instructions to be a different form. R_RISCV_ALIGN blurs the line a little here but it's not really the same, it's just a bit of a hack that is effectively creating a whole bunch of tiny ELF sub-sections each with their own alignment requirements, just in a bit of a weird way.

Another concern here is that weak symbols were originally invented for shared libraries, but for ~30 years the GNU toolchain has had an extension allowing weak symbols to be used in statically linked code, and this extension is frequently used by embedded programmers. There is a weak symbol in the newlib crt0.S for instance. And the SiFive freedom-e-sdk heavily uses weak symbols, though I'm hoping that this someday gets fixed in a rewrite. So any solution that increases code size and/or reduces performance for the handling of weak under symbols would be a major problem for embedded users and hardware companies that live and die by their benchmark numbers.

This proposal would do no such thing. A GOT-indirect access to a weak symbol would still be relaxable in every way that the current PC-relative weak accesses are relaxable, as well as just simply to a full 32-bit PC-relative accesses. This is how other architectures do this (c.f. amd64's GOTPCRELX). You just need to implement such relaxations in GNU ld, and if you're concerned about a new GCC emitting code that an old GNU ld can't relax then you can always have flags to enable/disable it for a while and default to one or the other based on the binutils version detected at configure time, as is done for various GNU as features across various targets.

jim-wilson commented 3 years ago

The "just write some code" is a non-trivial step. If it takes a year before someone does that, then we have a code size and performance regression for a year. This is the part I want to avoid. So someone needs to do a binutils/gcc implementation of the new scheme before we can switch, and there is no staff to work on that.

Yes, we would add a gcc configure check for the new binutils relaxation support.

jrtc27 commented 3 years ago

FWIW, the lack of this relaxation will currently be causing PIE (and PIC-code-linked-into-an-executable) binaries to use GOT-indirect accesses for many variables (anything extern), as PIE binaries use the GOT for anything that might require a copy relocation, for both GCC and Clang. This means that anyone benchmarking PIE vs PDE on RISC-V with a GNU toolchain will see highly misleading results that are a result of a missing set of relaxations, whereas at least LLVM will be consistently slower than it could be. So that relaxation should be seen as important to implement in GNU ld anyway regardless of this ABI edge-case.

jrtc27 commented 3 years ago

(which, of course, means pretty much every Linux distribution is missing out on many relaxations despite using a GNU toolchain, as PIE-by-default happened a while ago)

MaskRay commented 3 years ago

I agree with jrtc27's "My view on this is split into two parts:" comment.

There is a philosophical divide here. On the GNU toolchain side, linker relaxation is considered an important feature of the toolchain for hitting code size and performance goals. Thus this handling of weak undef is just another relaxation. On the LLVM toolchain side, linker relaxation is not implemented, and it is considered wrong for the linker to ever change an opcode emitted by the compiler. Thus this handling of weak undef is considered a compiler bug.

This isn't true. LLD is quite happy to perform various linker relaxations (we do TLS relocations for a number of architectures, and things like GOTPCRELX on amd64 to rewrite GOT accesses to RIP-relative accesses), but they are strictly optimisations and not required to be implemented. That means all code generated by the compiler should be able to be linked without rewriting instructions to be a different form. R_RISCV_ALIGN blurs the line a little here but it's not really the same, it's just a bit of a hack that is effectively creating a whole bunch of tiny ELF sub-sections each with their own alignment requirements, just in a bit of a weird way.

Right. Linker optimizations/relaxations should be optional. They should not be mandated for correctness.

The -fno-pic -mcmodel=medany R_RISCV_PCREL_HI20, R_RIRSCV_PCREL_LO2{I,S} problem (https://github.com/ClangBuiltLinux/linux/issues/1409) we have is GNU ld's instruction rewriting to lui+addi is mandated for correctness. This is not a desired protocol between the compiler and the linker.

Plus: what should ld.bfd --no-relax do in this case? Currently it keeps instruction rewriting.

Another concern here is that weak symbols were originally invented for shared libraries, but for ~30 years the GNU toolchain has had an extension allowing weak symbols to be used in statically linked code, and this extension is frequently used by embedded programmers. There is a weak symbol in the newlib crt0.S for instance. And the SiFive freedom-e-sdk heavily uses weak symbols, though I'm hoping that this someday gets fixed in a rewrite. So any solution that increases code size and/or reduces performance for the handling of weak under symbols would be a major problem for embedded users and hardware companies that live and die by their benchmark numbers.

Hmmm, my thoughts on this (not too on topic): Weak definitions are clearly for precedences within a component (which can be an executable) and do nothing cross DSOs (LD_DYNAMIC_WEAK was a glibc mistake). It were not specifically for shared libraries.

if we had separate embedded and unix ABIs, then perhaps we could just do it the LLVM way for linux and the GNU way for embedded. But since we don't have separate embedded/unix ABIs, switching gcc to the llvm scheme would penalize embedded users without new relocations and relaxations, which means we like need a fair amount of binutils/gcc work to be done before we can switch, and not clear who will do the work as RISC-V International doesn't have any staff that can work on such problems.

The "just write some code" is a non-trivial step. If it takes a year before someone does that, then we have a code size and performance regression for a year. This is the part I want to avoid. So someone needs to do a binutils/gcc implementation of the new scheme before we can switch, and there is no staff to work on that.

Yes, we would add a gcc configure check for the new binutils relaxation support.

This reasoning is entirely based on "whether undefined weaks in an application are a performance bottleneck". I am not sure I'd agree with the premise. I will go so far as to say global variable accesses, regardless of the binding, should not be a performance bottleneck.

Toolchains not implementing GOT-indirection to PC-relative instruction rewriting should not be used as an argument to hold up the codegen change.

If an application does want to optimize global variable accesses, mark variable declarations with __attribute__((visibility("hidden"))) (if only definitions are concerned, just use -fvisibility=hidden). For weak references, the main point we are using it is because we know it may end up undefined. As such, avoid PC-relative instructions in the compiler codegen. Then our problem is solved.

jim-wilson commented 3 years ago

There is a very serious practical problem here. Naive users checking code size just compile simple programs and look at code size without doing any analysis. RISC-V is losing adopters because our code size is not as good as ARMv7. Changing the gcc compiler generation without adding the binutils relocations and relaxations will make the problem worse, because as I already mentioned, there is an undef weak reference in newlib crt0.S. Yes, it is just a few bytes, but every few bytes are precious when talking about embedded code size. Embedded programmers care a great deal about code size. An ABI change that increases code size is not OK. This is a problem that can not be ignored if we all want RISC-V to be a successful architecture.

This is why I made the point about embedded ABI versus Unix ABIs. It would be OK to make the gcc code change for linux today but not for embedded elf, and we don't have any easy way to do that currently. It would be OK to make the gcc code change for -fpic today but not medany, and that may not be good enough. For the embedded elf case or the medany case, we need the binutils work to be done first to avoid code size and/or performance regressions.

I agree in principle with the suggestions here. The only issue is who is going to do the binutils work required, and when.

kito-cheng commented 3 years ago

Just my experiences for embedded world, lots of embedded are using customized linker script, and they won't expect GOT are there, change the behavior to using GOT might be a surprises and break their code, so I would suggest change linux target only.

jrtc27 commented 3 years ago

You could say the same about .riscv.attributes yet that was fine to add. New sections come all the time, linker scripts have to evolve.

kito-cheng commented 3 years ago

.riscv.attributes is slight different than the unexpected got section to me, missing .riscv.attributes won't break the program, but missing .got will made the program can't link correctly, hmmm...I guess I am conservative guy on the embedded side.

compnerd commented 2 years ago

@jrtc27 - could we document a position in the psABI? I am less concerned about the exact position and more that we have the requirements on this documented in the specification.

preames commented 2 years ago

TLDR - I think we've found a case where the proposed GOT lowering is unsound. Assuming we're correct, we need to reopen this issue and decide on a path forward.

The concerning case

The case I’m concerned about is a weak reference to a symbol with hidden visibility. A reference to a hidden symbol is required to not be preemptible. To my knowledge, any symbol exposed in the GOT is preemptible. As a result, I don't believe that using a GOT is a legal lowering for such a reference.

I want to be careful here to say that I’m not making a strong claim about either toolchain’s implementation. I’ve tried to glance at them, but I’m sufficiently outside of my area of expertise, I could easily be misreading something. I do mention current behavior in a few places, but that’s solely for narrative clarity sake. The important bit here is making sure that the wording in the specification is feasible.

Background

ELF Generic ABI document: http://www.sco.com/developers/gabi/latest/ch4.symtab.html (See figure 4-21 and associated commentary.)

A weak reference is one which may be unavailable at link time. A weak reference to an unavailable symbol is expected to resolve to null at runtime. Note that there is an unrelated concept of precedence for symbol definitions which also uses the term “weak”, but is not directly connected to the meaning used here.

A preemptible symbol is one whose definition can be changed at load time in a well defined manner.

A symbol’s visibility determines its preemptibility. Specifically, a DEFAULT visibility symbol must be preemptible, A HIDDEN symbol must not be preemptible. A PROTECTED symbol is also non-preemptible.

What should we do?

For DEFAULT visibility weak symbols, the compiler and linker must arrange for a GOT entry to be used. Using a PCREL_* relocation sequence is incorrect because the result is not preemptible.

For non-DEFAULT (HIDDEN, PROTECTED, INTERNAL) visibility weak symbols, neither GOT or PCREL_* sequences are correct. The GOT lowering implicitly makes the symbol preemptible, which is incorrect. However, the PCREL lowering can't represent the absolute zero (per the current draft of the psABI).

I see three options here.

Option 1 - Consider the hidden weak symbol case implementation defined.

Wording in the current specification allows, but does not require, a linker to emit an error if the resolved symbol is out of bounds for the relocation type. This means that GNU LD’s current implementation is allowable under the specification. LLD would currently report an error, which would seem to violate the wording requiring support for weak symbols in the medany code model. Given that, we'd replace the paragraph which currently starts with: "As a special edge-case, undefined weak symbols must still be supported" with the following text:

"As a special edge-case, undefined weak symbols may still be supported, whose addresses will be 0 and may be out of range depending on the address at which the code is linked. A complaint implementation may report an error for such undefined weak symbols instead."

(We'd probably want some text following to explain the preemption interaction, but getting that precisely right for normative text would be dangerously complicated, so this should probably be explicitly non-normative.)

This option has the possibly undesirable effect of allowing a compliant implementation not to fully support weak symbols. Since we have such an implementation today - LLD - maybe this is okay.

Option 2 - Mandate rewriting the pc-relative addressing sequence into an absolute addressing sequence to yield the value 0.

This option would change the definition of the PCREL_* relocation types to mandate that absolute zero is an allowed resolution. This would require all compliant linkers to rewrite the compiler emitted auipc sequence into the absolute sequence if needed. We'd still need to change the spec to remove the wording about the GOT lowering, but the "must" requirement for weak symbols could stay. Note that although it is simple to materialize the value 0 by simply referencing or copying register x0, if relaxation is disabled at link time, the length of the original instruction sequence must be retained and not shortened when rewritten as an absolute addressing sequence.

This would require a change in LLD to be compliant with the revised spec. I am unclear if GNU LD handles the default visibility case correctly; we should confirm.

Option 3 - Find another relocation which subsumes both

If we don't want to redefine PCREL_*, we could introduce a new relocation type with a sufficiently general compiler emitted sequence to allow either PC relative or 0 resolution.

I mention this option solely for completeness. Defining such a new relocation seems like a major change to the proposed spec, and would require implementation work in both toolchains.

Shared Work

I wrote up this post, but most of the work on digging into the details of the existing specifications was done by Greg McGary and Saleem Abdulrasool. Mistakes are mine; credit goes to them.

jrtc27 commented 2 years ago

You can have non-preemptible symbols in the GOT. I don’t see the problem; the GOT is just a linker-generated constant pool.

compnerd commented 2 years ago

Yes, the GOT is a linker generated constant pool, however, it is better used for the cases where preemption is required. Yes, as per the verbiage, it is permissible to create a GOT entry even in the case that the symbol may not be preempted, however, architectures do relax those to the PC-relative form which IMO is the reasonable thing to do.

I'm not sure I understand the resistance to explicitly spelling out in the psABI that we want the GOT form to be emitted preferably and that size considerations are not important.

preames commented 2 years ago

@compnerd - I'm very confused by your last response. I though in offline conversation we'd concluded that it was not legal to have a non-preemptible symbol in the GOT of the final link result? Are you now saying it is legal, and is only a size concern?

MaskRay commented 2 years ago

The concerning case

[...]

A reference to a hidden symbol is required to not be preemptible. To my knowledge, any symbol exposed in the GOT is preemptible. As a result, I don't believe that using a GOT is a legal lowering for such a reference.

No. A GOT entry may hold a link-time constant, a word relocated by R_*_GLOB_DAT, or a word relocated by a relative relocation. https://maskray.me/blog/2021-08-29-all-about-global-offset-table#linker-behavior

Linkers' GOT optimizations check many conditions when it can be performed. To the best of knowledge lld does the right thing for all ports (aarch64/powerpc64/x86-64) implementing GOT optimization. I've reported GNU ld bugs like incorrectly relaxed SHN_ABS and latest binutils has fixed it.

Background

[...]

A symbol’s visibility determines its preemptibility. Specifically, a DEFAULT visibility symbol must be preemptible, A HIDDEN symbol must not be preemptible. A PROTECTED symbol is also non-preemptible.

Correction to the "DEFAULT" part. STV_DEFAULT is a necessary but insufficient condition. E.g. a defined STV_DEFAULT symbol in an executable or a -Bsymbolic linked shared object is non-preemptible.

For DEFAULT visibility weak symbols, the compiler and linker must arrange for a GOT entry to be used. Using a PCREL_* relocation sequence is incorrect because the result is not preemptible.

Yes.

For non-DEFAULT (HIDDEN, PROTECTED, INTERNAL) visibility weak symbols, neither GOT or PCREL_* sequences are correct. The GOT lowering implicitly makes the symbol preemptible, which is incorrect. However, the PCREL lowering can't represent the absolute zero (per the current draft of the psABI).

Disagree. GOT is appropriate here. Clang's aarch64/x86 ports and GCC's aarch64 port are good references. GCC x86-64 is somewhat borked due to a -fpie HAVE_LD_PIE_COPYRELOC issue (https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected). Check out GCC's x86-64 behavior with a grain of salt.

The key idea is that linker optimization/relaxation is optional (see my reply on Jul 8, 2021). The compiler should give a way to satisfy "Unresolved weak symbols have a zero value." at link time even in the absence of linker optimization/relaxation. This requires a GOT generating code sequence, as the proposed LLVM patch https://reviews.llvm.org/D107280 will do.

This choice makes visibility less of an optimization approach, but STV_HIDDEN/STV_INTERNAL visibilities still have some effects. They ensure that the relocations to the undefined weak symbol cannot resolve to another component by rtld at run time.

FWIW I don't really see the rationale behind objection to GOT for (all visibility) undefined weak symbol in -mcmodel=medany mode. How often do you see a undefined weak variable used in a way where its performance matters and a GOT indirection would make the performance unacceptable? (https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/201#pullrequestreview-703610923)

We should also take into account the fact that The undefined weak symbol usage is quite specific to ELF and rarely used in portable software (https://maskray.me/blog/2021-04-25-weak-symbol). E.g. it's quite involved to do this in PE-COFF.

Option 1 - Consider the hidden weak symbol case implementation defined.

I think this is the only feasible direction.

kito-cheng commented 2 years ago

@MaskRay @jrtc27 Just make sure I didn't misunderstanding anything, so GOT approach has no issue with non-preemptible/preemptible issue?

kito-cheng commented 2 years ago

I'm not sure I understand the resistance to explicitly spelling out in the psABI that we want the GOT form to be emitted preferably and that size considerations are not important.

@compnerd code size is important, but weak symbol should not be big portion of the program in general.

kito-cheng commented 2 years ago

@rui314 did you mind take a look about this? Just want to make sure every linker expert is here and think it make sense :)

rui314 commented 2 years ago

Re: GOT, I don't think there's a soundness issue. The linker can represent a hidden weak undefined symbol in all cases:

This is exactly what we do in mold, and I believe that's also true to lld.

Regarding the original topic of this issue, I actually do not have a strong opinion. I agree with @maskray and others that linker relaxation should be optional. It feels like a program that doesn't work with the linker --no-relax flag is similar to a program that works only with -O2 but not with -O0. That being said, the damage has already been done; the RISC-V psABI mandate the static linker process R_RISCV_ALIGN relocations. Compare to R_RISCV_ALIGN, relaxing auipc+addi to lui+addi should be trivial. mold currently doesn't do that, but I'll add it for the sake of compatibility with GNU ld.

compnerd commented 2 years ago

@rui314 I agree with the sentiment that linker relaxations should be optional, but the fact is, as you said, "the damage has already been done". This seems like a pretty trivial relaxation.

I think that at a more philosophical level, using the GOT for the reference feels like the symbol participates in dynamic linking, when a hidden visibility symbol is supposed to not participate in dynamic linking. While to the letter, it seems to violate the spirit IMO. While I do not know of any loader that reject or may possibly mishandle the approach, it does seem like it is generating something more permissive than necessary when there is the possibility of something somewhat simple being able to prevent that.

preames commented 2 years ago

Sorry for raising a functional concern that appears to have been based on a false premise. Apparently, I should have cross checked my facts better. Sorry for the noise.

The remainder of this is a summary of my current understanding. I am by no means an expert here, so hopefully my paraphrasing is correct enough not to be further confusing.

The sentence in my prior post "To my knowledge, any symbol exposed in the GOT is preemptible. " turns out to be untrue. @rui314 nicely explains how non-preemptible symbols are handled by the dynamic loader if placed in the GOT. As such, to the best of my knowledge, the GOT lowering is a valid and sound lowering for all weak references.

As such, the wording in the current psABI document being proposed for ratification is correct. There might be room for improvement in wording, but nothing I consider blocking for ratification.

What we left with is a mixture of code size, performance, and a stylistic concern.

I personally find the code size and performance argument largely unconvincing. However, there's also an implementation strategy which seems to completely nullify them at low cost. Does anyone see an reason not to simplify implement an (entirely optional) relaxation for converting GOT sequences to absolute or pc relative when legal? My understanding is that the result of the static link should be largely identical to the current gnu output if this was implemented. Is this challenging or tricky for any reason?

I'm not going to comment on the stylistic bits. I'll leave that discussion to folks directly involved in linker implementations as I don't have any strong opinion.

For LLD, I do think that we should implement GNU compatible handling of a weak references to non-preemptible symbols using the pc relative sequences. Not because the spec does or should require it, but because there's a decent amount of existing code out there compiled with the current gnu approach, and I'd like lld to be a drop in replacement for ld where possible. Having such a compatibility case doesn't appear to break the spec in any way, so I'd argue we should implement it for maximum compatibility - regardless of whether it was stylistically a good idea or not.

jrtc27 commented 2 years ago

It’s not tricky, I’ve suggested it exist in the past (as part of the justification for making this change to the psABI) and it would greatly benefit PIEs. It just doesn’t currently exist.

aswaterman commented 2 years ago

For LLD, I do think that we should implement GNU compatible handling of a weak references to non-preemptible symbols using the pc relative sequences. Not because the spec does or should require it, but because there's a decent amount of existing code out there compiled with the current gnu approach, and I'd like lld to be a drop in replacement for ld where possible. Having such a compatibility case doesn't appear to break the spec in any way, so I'd argue we should implement it for maximum compatibility - regardless of whether it was stylistically a good idea or not.

That seems pragmatic to me, though I would've additionally advocated for making it part of the spec if it is implemented in both Binutils and LLD. I know it's a widely disliked feature, but it seems like it would minimize confusion to spec it if both major linkers support it.

MaskRay commented 2 years ago

I think two things have been conflated. However, the first comment doesn't really make it clear what is being discussed. Now I forget what I talked to @compnerd about the Linux kernel vdso issue in 2021. I try to guess what people are discussing here.

(a) When referencing an undefined weak symbol, use which kind of code sequence, PC-relative or GOT? (b) Whether a linker should implement R_RISCV_PCREL_HI20/R_RIRSCV_PCREL_LO2_I => lui+addi relaxation? In the absence of R_RISCV_RELAX?

I'll discuss them separately.

The current GCC codegen is PC-relative. Below I will describe why it doesn't make sense.

% cat a.c
__attribute__((weak)) extern int var;
void *addr() { return &var; }
% riscv64-linux-gnu-gcc -fno-pic -O2 -mcmodel=medany a.c -c
% riscv64-linux-gnu-gcc -fno-pic -O2 -mcmodel=medany a.c -S
% cat a.s
...
addr:
        lla     a0,var
        ret
...
% riscv64-linux-gnu-objdump -dr a.o | sed -n '/addr/,$p'
0000000000000000 <addr>:
   0:   00000517                auipc   a0,0x0
                        0: R_RISCV_PCREL_HI20   var
                        0: R_RISCV_RELAX        *ABS*
   4:   00050513                mv      a0,a0
                        4: R_RISCV_PCREL_LO12_I .L0 
                        4: R_RISCV_RELAX        *ABS*
   8:   8082                    ret

With -mno-relax, there is no R_RISCV_RELAX. I think the spirit is that the linker should not perform relaxation, but GNU ld currently does code sequence rewriting (to lui) anyway, which does not make sense to me.

% riscv64-linux-gnu-gcc -fno-pic -O2 -mcmodel=medany a.c -e 0 -nostdlib -no-pie -Wl,-Ttext=0x80001000 && llvm-objdump -d a.out | sed -n '/addr/,$p'
0000000080001000 <addr>:
80001000: 13 05 00 00   mv      a0, zero
80001004: 82 80         ret
% riscv64-linux-gnu-gcc -fno-pic -O2 -mcmodel=medany -mno-relax a.c -e 0 -nostdlib -no-pie -Wl,-Ttext=0x80001000 && llvm-objdump -d a.out | sed -n '/addr/,$p'
0000000080001000 <addr>:
80001000: 37 05 00 00   lui     a0, 0
80001004: 13 05 05 00   mv      a0, a0
80001008: 82 80         ret

In addition, if we have a code model supporting text+data spanning larger than 4GiB, I don't know whether it's still the linker's responsibility to paper over the code gen issue of the compiler.

In the presence of R_RISCV_RELAX, I think in -no-pie mode R_RISCV_PCREL_HI20/R_RIRSCV_PCREL_LO2_I => lui+addi relaxation makes sense. I think it helps some programs which make significant use of global variables. I don't recommend such benchmarks but I acknowledge the existence of them and understand that some people may care about them a lot. I'll add a note that they are largely out of the scope of regular Linux desktop/server uses because of the prevalence of PIE.

MaskRay commented 2 years ago

@MaskRay @jrtc27 Just make sure I didn't misunderstanding anything, so GOT approach has no issue with non-preemptible/preemptible issue?

No issue. And the GOT code sequence will match most modern architectures.