Closed boomshroom closed 8 months ago
@llvm/issue-subscribers-backend-risc-v
Author: Angelo Bulfone (boomshroom)
@boomshroom Hi, can I work on this issue?
Sure. It probably wouldn't be too hard to copy and paste the logic from one of the several other targets that match this pattern, though the fact that it does seem to be duplicated so many times makes me suspicious that doing so is really the best idea. I personally think adding a target-independent "saturating truncation" node would be better, though I'm less sure how to go about that and it would entail a larger change to the overall infrastructure.
@boomshroom Can you assign this issue to me?
The previous pattern matching around the trunc
mostly uses vnsrl
.
I think using vnsrl
might be a better choice, since vnclip
may bring new complexity due to the vxrm
.
If the shift is 0, then it should never have to round and thus vxrm
shouldn't pose an issue.
Also vnsrl
always floors when inexact and wraps when the value is out of range. Saturating when out of range is what vnclip
does, like the other instructions on other architectures do. Worth noting is that it only saturates unsigned to unsigned and signed to signed, so a min
or max
is still required to reduce the range to what's representable in the desired signedness.
I still wonder why such an operation that seems fairly useful, relatively common in ISAs, and is non-trivial to match, doesn't have a more target-independent representation in LLVM. Could save some work matching the patterns before lowering to individual ISAs and reduce the duplication between them.
I'm planning to add a new opcode ISD::TRUNCSAT
to SelectionDAG, and do a target-independent combine for it.
Closely related to #68466 and #58266, but for a different platform.
Much like x86's (V)PACK(U,S)S and ARM's (U,S)QXTN(2), RISC-V's V extension includes VNCLIP(U).W(V,X,I), which performs a right shift and then performs either signed or unsigned saturation to vector of integers with half the width. With a shift value of 0 (whether from an immediate or the register; the V extension specification doesn't specifically talk about this use case), the instruction can be used to perform a saturating truncation from vectors of one integer type, the the type with half the width.
Given the complexity of matching saturating truncation, and the apparent duplication between targets of this functionality, it could be worth considering matching it earlier in the pipeline. Matching a saturating truncation preceded by a shift should be much simpler and can be done with TableGen.
Example code with current code gen, next to x86 and AArch64 equivalents.
https://rust.godbolt.org/z/bKdsq8GGM