riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
710 stars 163 forks source link

Relocations, ELF Markers, and >32-bit Instructions #453

Open lenary opened 1 week ago

lenary commented 1 week ago

I took a good look through the currently open issues, and none specifically address any of these issues, but sorry if I missed something. Some of the discussion in #393 touches on these issues but not in any great depth. I've also been working with other architectures for a while, so am still getting back up to speed with the RISC-V psABI details again.

This issue is going to touch on a series of related issues, around support for longer-than-32-bit instructions. I'm sorry I didn't get this to you in time for the most recent psABI meeting, but hopefully that gives you time to think about the issues before the next one.

One of my central queries is about the meaning of EF_RISCV_RVC - this denotes whether you can use 16-bit aligned instructions (rather than 32-bit), and whether you can use c extension instructions during relaxation. I'll note that LLVM has updated how it interprets this flag, to still mean the former, but for the latter mean just Zca, not all of C 1 (this should have been expected when C was exploded into a lot of sub-extensions, some incompatible with each other).

What about binaries containing 48-bit instructions? We have a set of public isa extensions, Xqci 2, which we want to add support for, which contains 48-bit instructions (something the core ISA standard is yet to ratify encodings for, but space has been reserved for >32-bit instructions). 48-bit instructions require us to have 16-bit aligned instructions (or else we would have 64-bit instructions). In our case, none of the sub-extensions which have 48-bit instructions also have 16-bit instructions, nor do any require/imply C/Zca.

Both implications of the EF_RISCV_RVC flag are also redundant: sections already have an alignment (with obvious semantics when two sections are merged together: enforce the higher alignment), and we now have architecture build attributes which we can query to work out which extensions we are allowed to use during relaxation.

So, should we be setting the EF_RISCV_RVC flag for binaries containing 48-bit instructions? Maybe it would be ok to keep EF_RISCV_RVC clear but still mark any code sections as having 16-bit alignment, which the linker should be honouring? Some guidance as to a reasonable direction to take here would be helpful. We could allocate ourselves a non-standard extension elf flag to represent "this object contains 16-bit aligned instructions, but not necessarily C/Zca", but we would like support for these instructions to go upstream and allocating a non-standard extension bit for this seems greedy and potentially unnecessary.

I have a similar query relating to relocations on 48-bit instructions. The Xqcibi sub-extension (described in the release, above) contains some 48-bit branch immediate instructions (qc.e.b<cond>i) where the branch offset is encoded into the exact same bits that would be used by a b<cond> instruction. The ISA designers did this so they could use an R_RISCV_BRANCH relocation in their prototype toolchain. My concern is that this is likely to have a knock-on effect on relaxations and beyond - we have instruction types for a reason, and we quite like to use them in the ABI (aside: the document/yaml for Xqci doesn't mention instruction types, which is a drawback, but I think this is shared by the riscv-unified-db upstream too). Right now, all instruction relocations end up well-aligned with the start of the instruction they apply to.

Broadly, my question is: do we want to reuse an existing relocation like this (on a longer instructions of a different type), or would we prefer that all relocations are correct for the instruction type (and size)? My gut feeling is that we do want new relocations for new instruction types, to keep relocations obvious and aligned with instruction boundaries, but I'd be interested to hear other opinions. I think keeping relocations aligned with instructions and only applied to instructions with the correct type makes relaxations easier and less brittle, but I'm not 100% sure on that. The specific implication here is that we might end up needing quite a lot of new relocations as we get longer instructions, but I think we'd reasonably quickly stop getting lots more instructions for materializing addresses.

I think maybe @kito-cheng and @asb might be expecting some of these queries, but keen to hear from others too.

jrtc27 commented 1 week ago

For instruction alignment, just because your instructions individually can be 16-bit aligned doesn't mean the whole section only needs that. For example, xtvec requires 4-byte alignment on the address even with RVC due to using the low 2 bits as the mode, so any OS's text section will be at least 4-byte aligned (and the trap vector at a 4-byte aligned offset within that). So whilst align(.text) == 2 implies you can use 2-byte instructions, the converse is not true, and thus align(.text) == 4 does not imply that 2-byte instructions aren't in use.

The relocation normally needs to imply the instruction size, yes. Even on X86 where the actual operand may be encoded in a uniform manner despite different instruction prefixes, the number of prefix bytes still gets encoded so you can do relaxation (albeit in a more limited manner there).

jrtc27 commented 1 week ago

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

lenary commented 1 week ago

For instruction alignment, just because your instructions individually can be 16-bit aligned doesn't mean the whole section only needs that. For example, xtvec requires 4-byte alignment on the address even with RVC due to using the low 2 bits as the mode, so any OS's text section will be at least 4-byte aligned (and the trap vector at a 4-byte aligned offset within that). So whilst align(.text) == 2 implies you can use 2-byte instructions, the converse is not true, and thus align(.text) == 4 does not imply that 2-byte instructions aren't in use.

Ah, ok I did miss this nuance, that executable section alignment isn't 1:1 with IALIGN (and I probably should have re-read the unprivileged spec to remind me how the ISA refers to this situation, before posting). I think my overall question still stands, that EF_RISCV_RVC implies two things: IALIGN=16 bits and "instructions from [some part of] the C extension are allowed to be introduced when relaxing". I intended to point out that Xqci contains sub-extensions which want IALIGN=16 bits, but don't necessarily want the changes to relaxations.

The relocation normally needs to imply the instruction size, yes. Even on X86 where the actual operand may be encoded in a uniform manner despite different instruction prefixes, the number of prefix bytes still gets encoded so you can do relaxation (albeit in a more limited manner there).

I will go and read the x86 psABI to understand how it deals with relaxation and long instructions better, thanks for the tip. I think you're agreeing with my intended direction though, which sounds positive to me.

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

If you have 48-bit instructions (or any odd multiple of 16 bits), those extensions not implying IALIGN=16 bits is a little pointless - you've actually defined a set of 64-bit instructions (respectively, the next even multiple of 16 bits) and wasted 16 bits of the encoding with the same bits that you put in c.nop. Surely the point in adding 48-bit instruction encodings is so you can directly follow them with another instruction of any length, rather than having to pair them with a 16-bit instruction. Note that the unprivileged spec says "IALIGN is 32 bits in the base ISA, but some ISA extensions, including the compressed ISA extension, relax IALIGN to 16 bits" so presumably "some ISA extensions" could also include vendor extensions, not just C and its standard sub-extensions.

jrtc27 commented 1 week ago

Importantly, I don't think I can see how 48-bit instructions work without having a 16-bit NOP. Ditching the rest of C seems fine, but I think you need at least that one instruction from it.

If you have 48-bit instructions (or any odd multiple of 16 bits), those extensions not implying IALIGN=16 bits is a little pointless - you've actually defined a set of 64-bit instructions (respectively, the next even multiple of 16 bits) and wasted 16 bits of the encoding with the same bits that you put in c.nop. Surely the point in adding 48-bit instruction encodings is so you can directly follow them with another instruction of any length, rather than having to pair them with a 16-bit instruction. Note that the unprivileged spec says "IALIGN is 32 bits in the base ISA, but some ISA extensions, including the compressed ISA extension, relax IALIGN to 16 bits" so presumably "some ISA extensions" could also include vendor extensions, not just C and its standard sub-extensions.

I don't mean that they have to be followed by a 16-bit instruction. But for, say, R_RISCV_ALIGN, we currently insert c.nop if the padding is 2 mod 4. This case can't arise with 32-bit-only instructions, but can with 16+32-bit, and can with 32+48-bit. How do you insert 2 bytes of padding without c.nop? (You can of course do it 2n+2 for n > 0 if you have a 48-bit NOP, but n = 0 is a special case)

lenary commented 1 week ago

Thanks for clarifying, I had momentarily forgotten the "align with nops" requirements. Given the smallest architecture that contains c.nop is Zca, it now makes most sense for any extensions with instructions that are an odd multiple of 16 bits long to require Zca as well, which means I don't need to worry about the EF_RISCV_RVC flag.

I also read the x86-64 psabi and understand a bit more what's going on there, even though the relaxable relocations are mid-way through an instruction (at the start of an immediate field, which I think always comes last), the types indicate where the instruction started. It's a lot less clean than just having instruction types, and relevant relocations for each instruction type.

kito-cheng commented 1 week ago

I think the conclusion from the earlier discussion is that for instruction lengths greater than 32-bit, linker relaxation will require at least Zca, and I agree with this point. other than that, here are some additional thoughts I have on the topic:

For EF_RISCV_RVC:

The definition of this ELF flag has become a bit ambiguous after the introduction of Zc* standards. This ambiguity also extends to the meaning of .option rvc/.option norvc, but since we’re discussing ABI here, we’ll set that aside for now.

The current definition is:

This bit is set when the binary targets the C ABI, which allows instructions to be aligned to 16-bit boundaries (the base RV32 and RV64 ISAs only allow 32-bit instruction alignment). When linking objects that specify EF_RISCV_RVC, the linker is permitted to use RVC instructions such as C.JAL in the linker relaxation process.

However, after introducing Zc*, we might consider changing "permitted to use RVC instructions" to "permitted to use Zca instructions." But we also have an unresolved issue, #393, so we might want to consider removing the linker part in the latter half and let this flag simply represent IALIGN.

For dedicated relocation types for longer instruction length:

I can see the possibility of reusing some relocations for longer instructions in the future—for example, using R_RISCV_32 to handle a 32-bit immediate. However, from a linker relaxation and implementation standpoint, I’d prefer using new relocations instead of reusing existing ones. This could simplify some parts of the linker relaxation process (avoiding instruction scanning) and improve output readability in objdump or readelf. For example, if we had an instruction that could take a 32-bit immediate, with the first 16 bits potentially being an opcode, then the relocation would show up in the middle of the instruction.

For #393:

I still haven’t seen a better solution for this issue…maybe we should push forward on this with Nelson's help.

lenary commented 1 week ago

For dedicated relocation types for longer instruction length

[…] for example, using R_RISCV_32 to handle a 32-bit immediate […]

I did think about this, as some of the 48-bit instructions in Xqci have 32-bit contiguous immediate fields - the reason I discounted it is because of big-endian. I don't think there are big endian implementations yet, but I also don't think we want to use data relocations (which have to be endianness-aware) on instructions (which are always little endian) or vice-versa. I don't think altering the interpretation of a relocation depending on whether a section is executable or based on the marker symbols (for two examples) is a viable route forwards.

EF_RISCV_RVC: Thanks for pointing out #393 - I will think about this issue a bit more. As you say, we've slightly struggled since C was split into sub-parts. I will comment on that proposal.

kito-cheng commented 1 week ago

I did think about this, as some of the 48-bit instructions in Xqci have 32-bit contiguous immediate fields - the reason I discounted it is because of big-endian. I don't think there are big endian implementations yet, but I also don't think we want to use data relocations (which have to be endianness-aware) on instructions (which are always little endian) or vice-versa. I don't think altering the interpretation of a relocation depending on whether a section is executable or based on the marker symbols (for two examples) is a viable route forwards.

Good point on the endian...I didn't aware that, but that definite a potential issue, BTW, we did have few non-standard big endian software support like spike and GNU toolchain.