google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.16k stars 319 forks source link

Add SlideUpLanes, SlideDownLanes, CompressLoAndConcatHi, and CombineFirstNOfLoWithHi operations #1442

Closed johnplatts closed 1 year ago

johnplatts commented 1 year ago

Here are the additional operations that I would like to add to Highway:

In the case where Lanes(d) * sizeof(TFromD<D>) > 16 is true (which is possible on AVX2/AVX3/RVV/SVE/WASM_EMU256), the above operations cross block boundaries and operate on the first Lanes(d) lanes of each vector.

SlideUpLanes(d, v, N) is equivalent to doing a TwoTableLookupLanes(d, v, Zero(d), idx) with the following indices if N < Lanes(d) is true: { (-N) & idx_mask, (-N + 1) & idx_mask, (-N + 2) & idx_mask, ... } (where idx_mask is equal to 2 * Lanes(d) - 1)

SlideDownLanes(d, v, N) is equivalent to doing a TwoTableLookupLanes(d, v, Zero(d), idx) with the following indices if N < Lanes(d) is true: { N, N + 1, N + 2, ... }

CombineFirstNOfLoWithHi(d, hi, lo, N) is equivalent to IfThenElse(FirstN(d, N), lo, SlideUpLanes(d, hi, N)).

CompressLoAndConcatHi(d, hi, lo, mask) is equivalent to CombineFirstNOfLoWithHi(d, hi, Compress(lo, mask), CountTrue(mask)) (and detail::Splice(hi, lo, mask) on SVE targets).

Should SlideUpLanes and SlideDownLanes have implementation-defined behavior in the case where N >= Lanes(d), or should SlideUpLanes and SlideDownLanes require returning Zero(d) in the case where N >= Lanes(d)?

jan-wassenberg commented 1 year ago

Hi @johnplatts , thanks for proposing these, the Slide operations sound useful. We could also add Slide1Up/Down, that might allow better codegen especially on AVX2/3 without AVX3_DL.

Requiring N (possibly better to rename to lanes to avoid confusion with the other N) < Lanes() seems reasonable.

For the other two operations, I have mixed feelings - generally it is nice to cover instructions where possible without major performance cliffs across platforms, but these seem a bit exotic/rare. Do you have a use case in mind? Also, we at one point renamed Lo/Hi to Lower/Upper, would be nice to keep that for consistency.

johnplatts commented 1 year ago

Added SlideUpLanes, SlideDownLanes, Slide1Up, Slide1Down, SlideUpBlocks, and SlideDownBlocks in pull request #1496

jan-wassenberg commented 1 year ago

Thank you for implementing those! Closing.