Closed johnplatts closed 1 year ago
Hi @johnplatts , thanks for proposing these, the Slide operations sound useful. We could also add Slide1Up/Down, that might allow better codegen especially on AVX2/3 without AVX3_DL.
Requiring N (possibly better to rename to lanes to avoid confusion with the other N) < Lanes() seems reasonable.
For the other two operations, I have mixed feelings - generally it is nice to cover instructions where possible without major performance cliffs across platforms, but these seem a bit exotic/rare. Do you have a use case in mind? Also, we at one point renamed Lo/Hi to Lower/Upper, would be nice to keep that for consistency.
Added SlideUpLanes, SlideDownLanes, Slide1Up, Slide1Down, SlideUpBlocks, and SlideDownBlocks in pull request #1496
Thank you for implementing those! Closing.
Here are the additional operations that I would like to add to Highway:
In the case where
Lanes(d) * sizeof(TFromD<D>) > 16 is true
(which is possible on AVX2/AVX3/RVV/SVE/WASM_EMU256), the above operations cross block boundaries and operate on the firstLanes(d)
lanes of each vector.SlideUpLanes(d, v, N)
is equivalent to doing aTwoTableLookupLanes(d, v, Zero(d), idx)
with the following indices ifN < Lanes(d)
is true:{ (-N) & idx_mask, (-N + 1) & idx_mask, (-N + 2) & idx_mask, ... }
(whereidx_mask
is equal to2 * Lanes(d) - 1
)SlideDownLanes(d, v, N)
is equivalent to doing aTwoTableLookupLanes(d, v, Zero(d), idx)
with the following indices ifN < Lanes(d)
is true:{ N, N + 1, N + 2, ... }
CombineFirstNOfLoWithHi(d, hi, lo, N)
is equivalent toIfThenElse(FirstN(d, N), lo, SlideUpLanes(d, hi, N))
.CompressLoAndConcatHi(d, hi, lo, mask)
is equivalent toCombineFirstNOfLoWithHi(d, hi, Compress(lo, mask), CountTrue(mask))
(anddetail::Splice(hi, lo, mask)
on SVE targets).Should SlideUpLanes and SlideDownLanes have implementation-defined behavior in the case where
N >= Lanes(d)
, or should SlideUpLanes and SlideDownLanes require returningZero(d)
in the case whereN >= Lanes(d)
?