riscv-non-isa / rvv-intrinsic-doc

https://jira.riscv.org/browse/RVG-153
BSD 3-Clause "New" or "Revised" License
281 stars 89 forks source link

An unified naming scheme for the intrinsics parameter #266

Closed eopXD closed 10 months ago

eopXD commented 1 year ago

This proposal does not functionally change any behavior of the RVV intrinsics. This proposal hopes to adjust the naming scheme of the parameter in the intrinsic functions to avoid ambiguity in the specification.

Introduction

Like Rich has raised questions of the maskedoff parameter passed to the TUMU variants in this week's meeting, and also other issues like #258 has raised confusion that maskedoff parameter is presented in an unmasked intrinsic serving as the passthrough register for tail elements.

Browsing through the list of intrinsic prototypes the specification provides, the specification currently have several different naming-s to the intrinsic function parameters.


Listing a few immediate examples here for the rich naming-s we have for the parameters.

For example, for load intrinsics we have the following, and the store intrinsics with similar naming.

vint32m1_t __riscv_vle32_v_i32m1 (const int32_t *base, size_t vl);
vint32m1_t __riscv_vlse32_v_i32m1 (const int32_t *base, ptrdiff_t bstride, size_t vl);
vint32m1_t __riscv_vluxei32_v_i32m1 (const int32_t *base, vuint32m1_t bindex, size_t vl);
vint32m1_t __riscv_vle32ff_v_i32m1 (const int32_t *base, size_t *new_vl, size_t vl);

Intrinsics that map to computation that already has vd as input operand in its instruction semantics keep their vd naming. On the other hand, intrinsics that do not take vd when policy is not specified as undisturbed, the passthrough operand is named maskedoff. vslideup intrinsics have the destination register named as src and dest.

vint32m1_t __riscv_vadd_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);
vint32m1_t __riscv_vadd_vv_i32m1_tumu (vbool32_t mask, vint32m1_t maskedoff, vint32m1_t op1, vint32m1_t op2, size_t vl);
vint32m1_t __riscv_vmacc_vv_i32m1 (vint32m1_t vd, vint32m1_t vs1, vint32m1_t vs2, size_t vl);
vint32m1_t __riscv_vmacc_vv_i32m1_tumu (vbool32_t mask, vint32m1_t vd, vint32m1_t vs1, vint32m1_t vs2, size_t vl);
vint32m1_t __riscv_vslideup_vx_i32m1 (vint32m1_t dest, vint32m1_t src, size_t offset, size_t vl);
vint32m1_t __riscv_vslideup_vx_i32m1_tumu (vbool32_t mask, vint32m1_t dest, vint32m1_t src, size_t offset, size_t vl);

Intrinsics that maps to instructions that are non-commutable have naming-s like op1, op2. The current specification does not explain clearly on how they map to its assembly level corresponding (vs2 and vs1 / rs1).

vint32m1_t __riscv_vsub_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);

Reduction intrinsics have naming like vector / scalar.

vint32m1_t __riscv_vredsum_vs_i32m1_i32m1 (vint32m1_t vector, vint32m1_t scalar, size_t vl);

Proposal

Adjust the intrinsic parameter names to be identical to what is specified in the V extension specification. Using names like vd, vs2, vs1, rs1, vm.

The benefit of this rename is that users can be more clear how their RVV value in the intrinsics will be mapped to its corresponding assembly form.

The fallback of this rename is that users will need to check out the V extension specification to understand when to be clear of what they are controlling.

However I think that given the intrinsic users should be advanced users that are aware that they are leveraging low level (assembly level) semantic of the RISC-V V extension in the C language level, I think it is worth the tradeoff for the intrinsic specification to depend on the naming of the V extension specification, and to not invent its own naming.

Note

The corner case of this proposed naming scheme is the fault-only first load instructions, where vl will be modified in the instruction. In the intrinsics, we pass a pointer new_vl for the new vl to be returned by the intrinsic function. For this part, I think we can keep it the same.

vint32m1_t __riscv_vle32ff_v_i32m1 (const int32_t *base, size_t *new_vl, size_t vl);
dzaima commented 1 year ago

I think for some things (maskedoff especially, mask, unary ops, cases where the types make clear the meaning) it could make sense. But for non-commutative things I think it'd be unnecessarily confusing, e.g. I believe a subtract would end up as vint32m1_t __riscv_vsub_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl), where the vs2 before vs1 doesn't really make sense and might incorrectly make one think it'd do vs1-vs2. (I guess the assembly in the V spec names/orders them as such to be able to place the register/immediate arg in the rs1 spot, but this doesn't at all affect users of intrinsics)

I don't think intrinsics users will necessarily know assembly (though of course a good portion will), much less have read the spec which is the main place that currently uses vd/vs1/vs2 (in disassembly you'd just see some vop.vv v0, v1, v2, which could be either vop vd, vs2, vs1 or vop vd, vs1, vs2 depending on the op).

I'm possibly not a representative sample of intrinsics users, but from my usage of x86 and, to a lesser extent, ARM NEON intrinsics, I still sometimes don't know the x86 assembly arguments, and have basically no knowledge on ARM NEON assembly (though, to be fair, those two are more complicated than RVV, and if the intrinsics informed you of the mapping, it could be easy to learn).

eopXD commented 1 year ago

I think for some things (maskedoff especially, mask, unary ops, cases where the types make clear the meaning) it could make sense. But for non-commutative things I think it'd be unnecessarily confusing, e.g. I believe a subtract would end up as vint32m1_t __riscv_vsub_vv_i32m1(vint32m1_t vs2, vint32m1_t vs1, size_t vl), where the vs2 before vs1 doesn't really make sense and might incorrectly make one think it'd do vs1-vs2. (I guess the assembly in the V spec names/orders them as such to be able to place the register/immediate arg in the rs1 spot, but this doesn't at all affect users of intrinsics)

Comparing to the existing naming of op1 and op2 (vint32m1_t __riscv_vsub_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);) pointing the parameters conventions to assembly mnemonics is better than nothing.

I don't think intrinsics users will necessarily know assembly (though of course a good portion will), much less have read the spec which is the main place that currently uses vd/vs1/vs2 (in disassembly you'd just see some vop.vv v0, v1, v2, which could be either vop vd, vs2, vs1 or vop vd, vs1, vs2 depending on the op).

If it is safe to say RVV intrinsic users aim to write vector code with that exact semantic of that specific RVV instruction, then I think the user should at least have a sense of what assembly is. The intrinsics specification will point users to the V extension specification for reference, as it already does while explaining the control to the vector programming model.

v-spec also got misaligned parts, some places got vd, vs2, vs1 while other places got vd, vs1, vs2, but I think the intrinsics specification should not introduce other ambiguity and respect whatever is in the V extension.

dzaima commented 1 year ago

Comparing to the existing naming of op1 and op2 (vint32m1_t __riscv_vsub_vv_i32m1 (vint32m1_t op1, vint32m1_t op2, size_t vl);) pointing the parameters conventions to assembly mnemonics is better than nothing.

It could be said that the current op1 and op2 point to the n-th operand, i.e. vsub.vv vd, op1, op2 in disassembly, which is a direct mapping to a thing that everyone can utilize, whereas vs1/vs2 are things only within the spec document, which, after enough time, will be one of many sources on RVV. I'm kind of pushing this primarily because, at the start of having skimmed the spec, for a decent bit of time (until I got clang & qemu working & triple-checked the behavior), I did actually think that vsub.vv vd, vs2, vs1 might do vd = vs1-vs2 and the name ordering was weird specifically to point out that the arguments are in reverse order, and was thinking whether the intrinsics reorder them back or not; it's possible that others might not fall in this pitfall.

that exact semantic of that specific RVV instruction

For some of the more unusual operations you may have to look up stuff, but for integer/float arith & unit-stride load/store (i.e. enough ops for most things) it's very much feasible & reasonable to just guess what they're gonna do.

I think the user should at least have a sense of what assembly is. The intrinsics specification will point users to the V extension specification for reference

Many used to ARM/Intel specifications might just automatically go looking for some other resource. The RVV spec happens to be quite readable, but even then purpose-made tutorials are gonna be even more approachable, and might not ever mention vs2/vs1 (e.g. always using numbered registers or using a/b/c/x/y/z).

respect whatever is in the V extension

That is fair, and it could certainly add to confusion that the intrinsics and the V spec document use different conventions, but I think it's also a fair option to diverge to hide a confusing little detail whose reason doesn't affect intrinsics.

At the core here I guess is the target audience of intrinsics. Someone writing code for x86 & ARM & RVV probably won't dive too deep into any single one of them, and would heavily benefit from the intrinsics being self-contained and easily skimmable & immediately understandable, while someone focused on RVV & RISC-V specifically could benefit from uniformity across conventions.

And then there's the question of who this specific document is for. If that goal is advanced users, some less-advanced-user-friendly thing (incl. my viewer, which I want to be easily usable by anyone who knows C) could also just go with a different naming scheme.

topperc commented 11 months ago

Is the vl argument really VL or is it AVL? Is it required to be <= VLMAX?

nick-knight commented 11 months ago

It serves the same role as "AVL" in the vset{i}vl{i} (assembly) instructions.

eopXD commented 10 months ago

This is recently addressed in #271.