llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.78k stars 11.9k forks source link

[RISCV] Wrong register usage for `amocas.q.aqrl` inline assembly #111877

Open apetenchea opened 3 weeks ago

apetenchea commented 3 weeks ago

Problem

Atomic Compare-and-Swap instructions (amocas) are part of the experimental zacas1p0 extension. For more information on amocas instructions, see Chapter 16. "Zacas" Extension for Atomic Compare-and-Swap (CAS) Instructions, Version 1.0.0, from the The RISC-V Instruction Set Manual Volume I.

During code generation from inline assembly, the compiler "self-sabotages" itself by using the wrong regsiters.

Steps to reproduce

Suppose I have a file bug.c. I am trying to use the 128 bit version of amocas from inline assembly, like this:

__int128 amocas_q_aqrl(__int128 *atom, __int128 value)
{
  __int128 ret = 0xcafe;
  __asm__ __volatile__("  amocas.q.aqrl  %1, %2, (%0)"
           : "+r"(atom), "+r"(ret)
           : "r"(value)
           : "memory");
    return ret + value;
}

I am compiling it like this:

clang -menable-experimental-extensions --target=riscv64 -march=rv64imafdc_zicsr_zifencei_zacas1p0 -c bug.c --gcc-toolchain=/usr/riscv64-linux-gnu-

I am getting the following error:

bug.c:4:24: error: register must be even
    4 |   __asm__ __volatile__("  amocas.q.aqrl  %1, %2, (%0)"
      |                        ^
<inline asm>:1:23: note: instantiated into assembly here
    1 |           amocas.q.aqrl  a0, a3, (a2)
      |                              ^
1 error generated.

My setup is the following:

Note that other variants of amocas work fine, for example:

long amocas_d_aqrl(long *atom, long value)
{
  long ret = 0xcafe;
  __asm__ __volatile__("  amocas.d.aqrl  %1, %2, (%0)"
           : "+r"(atom), "+r"(ret)
           : "r"(value)
           : "memory");
  return ret + value;
}

The above compiles. Currently I have had this problem only with amocas.q, but it might be the case that amocas.d causes similar issues on riscv32.

Current workaround

I can get this to compile by tricking the compiler into not using register "a3":

__int128 amocas_q_aqrl(__int128 *atom, __int128 value)
{
  __int128 ret = 0xcafe;
  // must clobber "a3" register due to a compiler bug
  __asm__ __volatile__("  amocas.q.aqrl  %1, %2, (%0)"
           : "+r"(atom), "+r"(ret)
           : "r"(value)
           : "memory", "a3");
    return ret + value;
}

I'm leaving this here in case somebody experiences a similar issue. I would've gladly looked more into it, but I've got no idea where to start.

llvmbot commented 3 weeks ago

@llvm/issue-subscribers-backend-risc-v

Author: Alex Petenchea (apetenchea)

# Problem Atomic Compare-and-Swap instructions (amocas) are part of the experimental _zacas1p0_ extension. For more information on `amocas` instructions, see _Chapter 16. "Zacas" Extension for Atomic Compare-and-Swap (CAS) Instructions, Version 1.0.0_, from the _The RISC-V Instruction Set Manual Volume I_. During code generation from inline assembly, the compiler "self-sabotages" itself by using the wrong regsiters. # Steps to reproduce Suppose I have a file `bug.c`. I am trying to use the 128 bit version of `amocas` from inline assembly, like this: ```c __int128 amocas_q_aqrl(__int128 *atom, __int128 value) { __int128 ret = 0xcafe; __asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)" : "+r"(atom), "+r"(ret) : "r"(value) : "memory"); return ret + value; } ``` I am compiling it like this: ``` clang -menable-experimental-extensions --target=riscv64 -march=rv64imafdc_zicsr_zifencei_zacas1p0 -c bug.c --gcc-toolchain=/usr/riscv64-linux-gnu- ``` I am getting the following error: ``` bug.c:4:24: error: register must be even 4 | __asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)" | ^ <inline asm>:1:23: note: instantiated into assembly here 1 | amocas.q.aqrl a0, a3, (a2) | ^ 1 error generated. ``` My setup is the following: - Running on Linux 6.7.12-amd64 SMP PREEMPT_DYNAMIC Debian 6.7.12-1 (2024-04-24) x86_64 - For cross compilation, I have installed gcc-riscv64-linux-gnu version 4:14.1.0-2 - I have built llvm and clang from source. At the time of opening this issue, I am on branch `main`, last commit is https://github.com/llvm/llvm-project/commit/2190ffa0f7e874d04fd0f750142135faa5df5d6b - I can also reproduce it with a pre-installed version of clang 17.0.5 Note that other variants of `amocas` work fine, for example: ```c long amocas_d_aqrl(long *atom, long value) { long ret = 0xcafe; __asm__ __volatile__(" amocas.d.aqrl %1, %2, (%0)" : "+r"(atom), "+r"(ret) : "r"(value) : "memory"); return ret + value; } ``` The above compiles. Currently I have had this problem only with `amocas.q`, but it might be the case that `amocas.d` causes similar issues on riscv32. # Current workaround I can get this to compile by tricking the compiler into not using register "a3": ```c __int128 amocas_q_aqrl(__int128 *atom, __int128 value) { __int128 ret = 0xcafe; // must clobber "a3" register due to a compiler bug __asm__ __volatile__(" amocas.q.aqrl %1, %2, (%0)" : "+r"(atom), "+r"(ret) : "r"(value) : "memory", "a3"); return ret + value; } ``` I'm leaving this here in case somebody experiences a similar issue. I would've gladly looked more into it, but I've got no idea where to start.
topperc commented 3 weeks ago

The compiler doesn't read the text of the inline assembly so it doesn't know what instruction is being used. It only looks at the register constraints.

New constraints for inline assembly need to be defined to express this.

topperc commented 3 weeks ago

The compiler doesn't read the text of the inline assembly so it doesn't know what instruction is being used. It only looks at the register constraints.

New constraints for inline assembly need to be defined to express this.

Actually I don't think it's possible to add a constraint for this. The __int128 type is broken down into 2 64-bit values too early.

lenary commented 2 weeks ago

The compiler doesn't read the text of the inline assembly so it doesn't know what instruction is being used. It only looks at the register constraints. New constraints for inline assembly need to be defined to express this.

Actually I don't think it's possible to add a constraint for this. The __int128 type is broken down into 2 64-bit values too early.

I have an interest in supporting even-odd register pair constraints. They've come up here, and they'll come up more as things like Zilsd reach maturity and ratification. While the public Zilsd llvm implementation (by NXP -- https://github.com/llvm/llvm-project/compare/release/18.x...nxp-auto-tools:llvm-project:Zilsd/release/18.1.6 ) doesn't contain this support, evidently even-odd register pairs are becoming something repeated throughout the ISA.

It is high up my list to look at implementing support for these constraints and mapping __(u)int128_t/(u)int64_t onto them (on RV64/RV32 respectively), I don't think I require your help immediately, but I'll certainly be tagging you on reviews, and am happy to be given advice.

lenary commented 1 week ago

I think this will be fixed by https://github.com/llvm/llvm-project/pull/112983 - but it will also require the use of the Pr constraint for the i128 values, rather than the r constraint.

Thank you for the reasonably complex example, as it did help me to find crashes in my implementation.