llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.09k stars 11.59k forks source link

[RISCV] Vector saturating truncation not using VNCLIP #73424

Closed boomshroom closed 8 months ago

boomshroom commented 10 months ago

Closely related to #68466 and #58266, but for a different platform.

Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width.

Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen.

Example code with current code gen, next to x86 and AArch64 equivalents.

https://rust.godbolt.org/z/bKdsq8GGM

llvmbot commented 10 months ago

@llvm/issue-subscribers-backend-risc-v

Author: Angelo Bulfone (boomshroom)

Closely related to #68466 and #58266, but for a different platform. Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width. Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen. Example code with current code gen, next to x86 and AArch64 equivalents. https://rust.godbolt.org/z/bKdsq8GGM
sun-jacobi commented 9 months ago

@boomshroom Hi, can I work on this issue?

boomshroom commented 9 months ago

Sure. It probably wouldn't be too hard to copy and paste the logic from one of the several other targets that match this pattern, though the fact that it does seem to be duplicated so many times makes me suspicious that doing so is really the best idea. I personally think adding a target-independent "saturating truncation" node would be better, though I'm less sure how to go about that and it would entail a larger change to the overall infrastructure.

sun-jacobi commented 9 months ago

@boomshroom Can you assign this issue to me?

sun-jacobi commented 9 months ago

The previous pattern matching around the trunc mostly uses vnsrl. I think using vnsrl might be a better choice, since vnclip may bring new complexity due to the vxrm.

boomshroom commented 9 months ago

If the shift is 0, then it should never have to round and thus vxrm shouldn't pose an issue.

Also vnsrl always floors when inexact and wraps when the value is out of range. Saturating when out of range is what vnclip does, like the other instructions on other architectures do. Worth noting is that it only saturates unsigned to unsigned and signed to signed, so a min or max is still required to reduce the range to what's representable in the desired signedness.

I still wonder why such an operation that seems fairly useful, relatively common in ISAs, and is non-trivial to match, doesn't have a more target-independent representation in LLVM. Could save some work matching the patterns before lowering to individual ISAs and reduce the duplication between them.

sun-jacobi commented 8 months ago

I'm planning to add a new opcode ISD::TRUNCSAT to SelectionDAG, and do a target-independent combine for it.