llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28k stars 11.56k forks source link

AMDGPU disassembler has trouble disassembling SMEM and VMEM instructions that use s106-107 #62651

Open Venemo opened 1 year ago

Venemo commented 1 year ago

The disassembler thinks that a SMEM instruction that uses s106 and/or 107 is invalid, and same with VMEM instructions. But in fact these should work.

Venemo commented 1 year ago

CC @Flakebi

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-amdgpu

jayfoad commented 1 year ago

@venemo in the encoding, number 106 is used for vcc_lo and 107 for vcc_hi. You cannot refer to them as "s106" and "s107". I think the disassembler is already correct here. For example see test/MC/Disassembler/AMDGPU/gfx11_dasm_smem.txt which has cases like:

# GFX11: s_load_b32 vcc_hi, s[2:3], s0           ; encoding: [0xc1,0x1a,0x00,0xf4,0x00,0x00,0x00,0x00]

Do you have an example that you think is handled wrongly?

Venemo commented 1 year ago

Hi! Yes, you are right, s106 is equivalent to vcc_lo and s107 is equivalent to vcc_hi. They are encoded as 106 and 107 and should be always available at least on newer GPUs (such as RDNA) which always allocate the vcc. They should be valid in SMEM instructions (either as a destination or part of the descriptor) and VMEM instructions (for example, as part of the descriptor).

For example we sometimes emit instructions like this:
s4: %974:s[104-107] = s_buffer_load_dwordx4 %768:s[60-63], %749:s[40]
which is assembled into:
f4281a1e
we expect the instruction to be disassembled like this: s_buffer_load_dwordx4 s[104:107], s[60:63], s40

This instruction works fine on RDNA2, but the disassembler thinks it is invalid.

(Edited this comment to also include the expected output from the disassembler.)

marekolsak commented 1 year ago

The GFX11 ISA doc in section 3.3.1.4 says that multi-dword operands can't cross SGPR regions. It also says s0-107 is the length of one SGPR region. That means multi-dword operands can't cross from s104 to s108 (ttmp), but they can cross from s104 to s107 (vcc_hi). What Venemo says is correct.

It seems to be an inefficiency of LLVM that vcc isn't used for SGPR destinations beginning at or before s104. The assembler should also be able to decode it correctly, either as s[104:105,vcc_lo:vcc_hi], or as s[104:107].