Closed xry111 closed 5 months ago
Cc @xen0n @heiher @MaskRay @SixWeining @MQ-mengqing Ref https://github.com/llvm/llvm-project/pull/71907
Note that the easy solution may blow up things like
la.pcrel $a0, $t0, array + 0xffffffff
because we cannot encode "0xffffffff + 8" in r_addend. So perhaps the "hard" solution is actually easier...
In essence, the "hard" solution you've mentioned is for providing the necessary association between related relocs/insns, which does work, and is what RISC-V does (with their LO12 relocs referencing back to the HI20 reloc instead of the symbol) so most if not all of the machinery is already present.
Although I don't know if some kind of "macro-op fusion" in the micro-architecture would become possible if we abandon instruction scheduling for guaranteed adjacent immediate-loading insn snippets, given the additional relationship information also helps resolving other ambiguities (as can be seen in the comments in the LLD LoongArch code), I'd be in favor of the "hard" proposal too. The reloc name could use some bikeshedding but the info it provides is invaluable.
这个easy solution其实并不easy,因为S+A的模式,重定位符号带Addend,所以effective PC并不是 current PC - addend。
la.pcrel $xx, $yy, sym + 8
->
000: pcalau12i(sym + 8)
004: addi.d(sym + 8)
008: lu32i.d(sym + 8 + 8)
00c: lu52i.d(sym + 8 + 12)
lu32i.d和lu52i.d 在ld里面的计算应该还是按照固定的方法计算PC。
64bit la.pcrel 现在几乎没出问题的原因可能是 它不常用,即使使用了,出现条件也是边缘情况很不容易触发。 通过easy solution的方法解决 可能 会降低代码修改成本,但长远来看有弊端。
hard solution与RISCV类似,可行性很高,能满足调度。但是的确和现在ABI不一样,相信改动会很大。
这里抛几个我的想法或者是疑惑,些许是题外的, 1, 目前PCALA_HI20这个没有溢出检测,这个问题mold维护者也提过 2, pcalau12i lu32id lu52id 这三条指令放一起的话,能否像call36一样只做一个重定位 3, 现在这套重定位还是 4KB 位置无关,以前那套pcadd12i+addi.d/pcadd12i+ori+lu32i.d+lu52i.d+add.d是 4B 位置无关,是不是也能加这样的2/4条指令连续的,像call36一样做一个重定位 4, 按我理解,之所以explicit relocs,(1)是因为可以参与调度,(2)是因为有些加载地址操作(假设加载32位),可以共用第一条pcalau12i(HI),即一个HI可以被多个LO使用。no explicit relocs 使得更方便做relax。( 猜测 )如果按照RISCV那样,指令不仅能参与调度,还能在LO的位置做重定位;如果HI被引用次数为0,甚至还能删除HI。 5, 如果要做hard solution,是否call36也会被修改为类似。 6, 如果要做hard solution,会导致基础重定位变化,应该会导致部分软件需要修改。(或者说像现在-meplicit-relocs一样加些自动判断?)
如果要做hard solution,会导致基础重定位变化,应该会导致部分软件需要修改。(或者说像现在-meplicit-relocs一样加些自动判断?)
我理解这个地方的前向兼容性,主要在于新加的标记 reloc 会被不认识的组件当作未知记录,可能会无视,也可能报错;应该多数会报错。不过对于会无视新加标记的旧版本组件,或者支持新 ABI 的组件却处理旧的目标代码,那它们直接沿用老逻辑就行了。
Also cc @abner-chenc -- the ultimate solution to this issue will likely involve code changes at Go side, so your team's input/acknowledgement is also welcome.
如果要做hard solution,会导致基础重定位变化,应该会导致部分软件需要修改。(或者说像现在-meplicit-relocs一样加些自动判断?)
我理解这个地方的前向兼容性,主要在于新加的标记 reloc 会被不认识的组件当作未知记录,可能会无视,也可能报错;应该多数会报错。不过对于会无视新加标记的旧版本组件,或者支持新 ABI 的组件却处理旧的目标代码,那它们直接沿用老逻辑就行了。
A "fully backward-compatible" fix might be
# do not allow scheduling other instructions in-between them
.align 4
pcalau12i $t0, %pc_hi20(sym)
lu32i.d $t1, %pc64_lo20(sym)
lu52i.d $t1, $t1, %pc64_hi12(sym)
This guarantees that the pcalau12i, lu32i.d, and lu52i.d instructions are in the same 4K-page.
Should be fixed with Binutils 2.42, GCC 14, and LLVM 18 (all following psABI 2.30).
Background
For the extreme code model, we materialize the address of a symbol (either data or code) with:
Consider this example:
With
cc bug.s -Ttext=0x180000ff8 -Tdata=0x1000000000 -shared -nostdlib
we get:But this is wrong: the correct immediate in lu32i.d should be 15.
The problem is this "14" is calculated with the PC of the lu32i.d instruction (0x180001000), while in fact the PC of the pcalau12i instruction (0x180000ff8) shall be used.
Possible solution
Easy solution (limiting scheduling)
In GAS, emit 64-bit la.pcrel as-is:
In GCC, if
-mexplicit-relocs=always
, emit it as:Hard solution (allowing scheduling)
For GAS, use the easy solution.
For GCC, introduce a new reloc type "R_LARCH_EFFECTIVE_PC" and do something like: