llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.86k forks source link

Some 2-vector vector select identity shuffles may be better represented as moves #40778

Open LebedevRI opened 5 years ago

LebedevRI commented 5 years ago
Bugzilla Link 41433
Version trunk
OS Linux
CC @topperc,@dsprenkels,@RKSimon,@rotateright

Extended Description

Split off from llvm/llvm-project#40774

https://godbolt.org/z/_n1ggH

void example(m256i * restrict dest, const m256i restrict a) { (dest)[2] = (a)[2]; (dest)[3] = (*a)[3]; }

Here we do not ever touch the low half of dest, and replace the high part of dest with high part of a.

The naive asm could be:

vmovaps ymm0, ymmword ptr [rdi] vblendps ymm0, ymm0, ymmword ptr [rsi], 240 # ymm0 = ymm0[0,1,2,3],mem[4,5,6,7] vmovaps ymmword ptr [rdi], ymm0 vzeroupper

But we can also produce:

    vmovaps xmm0, xmmword ptr [rsi + 16]
    vmovaps xmmword ptr [rdi + 16], xmm0

I'm not quite sure what are the exact criteria when that is profitable to do though.

rotateright commented 5 years ago

This should be very good for x86 perf (replacing 2 ymm memops + a ymm shuffle with 2 xmm memops). Probably helps other targets too.

If we solve bug 41429, we have a vector select in the form of a shuffle in IR. Close enough to scalar select-store patterns to consider as a generic (DSE? earlyCSE?) optimization (see bug 39603) since it always eliminates ops?

If that's too far of a stretch, then we either have to enhance the SDAG load/store splitting/combining or make a custom pass to do this.