Open hiraditya opened 1 year ago
cc: @alexey-bataev @nikolaypanchenko
@llvm/issue-subscribers-backend-risc-v
For reference, here's a version of the function in question, utilizing VL: (written using intrinsics, compiler explorer)
fill_i16: # @fill_i16
vsetvli a3, zero, e16, m2, ta, ma
beqz a2, .LBB0_3
vmv.v.x v8, a1
.LBB0_2: # =>This Inner Loop Header: Depth=1
vsetvli a1, a2, e16, m2, ta, ma
vse16.v v8, (a0)
sub a2, a2, a1
sh1add a0, a1, a0
bnez a2, .LBB0_2
.LBB0_3:
ret
Besides the autovectorizer not knowing how to utilize VL (it just does masking; I belive https://reviews.llvm.org/D99750 is work towards VL usage?), it also has a scalar path even though the vectorized path can handle any length input by itself.
edit: perhaps this is a slightly better way to do this, making the initial broadcast not needlessly always be VL=VLMAX, but just what's needed for the first/all, iterations. But this adds a dependency on the length, probably slightly increasing latency; tradeoffs.
@alexey-bataev 's patch (https://reviews.llvm.org/D99750) should be sufficient as is for this code, however loops that have arithmetic operations compiler will emit lots of vsetvli
s. Follow up patches will help to clean them up and eventually we should be able to generate this code:
# %bb.1: # %vector.ph
li a3, 0
vsetvli a4, zero, e16, m1, ta, ma
vmv.v.x v8, a1
.LBB0_2: # %vector.body
# =>This Inner Loop Header: Depth=1
sub a1, a2, a3
vsetvli a1, a1, e16, m1, ta, ma
sh1add a4, a3, a0
add a3, a3, a1
vse16.v v8, (a4)
bne a3, a2, .LBB0_2
.LBB0_3: # %for.cond.cleanup
ret
(code is generated by our compiler with no remainder loop)
I think the memset_pattern line of work should provide a path to generating simple code for this type of loop.
I think the memset_pattern line of work should provide a path to generating simple code for this type of loop.
Yeah, that should fix this issue. I believe the patch is here: #97583 ?
Cc: @topperc FYI.
Derived from: https://github.com/llvm/llvm-project/issues/66652
riscv-clang -Os -march=rv64gcv_zba_zbb_zbs
arm-clang -Os -march=armv8-a+sve