Suboptimal codegen for vzip1q of values that were loaded as float32x2 in AArch64

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.87k stars 11.92k forks source link

Suboptimal codegen for vzip1q of values that were loaded as float32x2 in AArch64 #58584

Open TellowKrinkle opened 2 years ago

TellowKrinkle commented 2 years ago

Godbolt link: https://gcc.godbolt.org/z/vqTna3cGv

The following C code:

float32x4_t zip(const float* a, const float* b) {
    float32x4_t va = vcombine_f32(vld1_f32(a), vdup_n_f32(0.0f));
    float32x4_t vb = vcombine_f32(vld1_f32(b), vdup_n_f32(0.0f));
    return vzip1q_f32(va, vb);
}

compiles to this:

ldr     d0, [x0]
ldr     d1, [x1]
zip2    v2.2s, v0.2s, v1.2s
zip1    v0.2s, v0.2s, v1.2s
mov     v0.d[1], v2.d[0]
ret

instead of

ldr     d0, [x0]
ldr     d1, [x1]
zip1    v0.4s, v0.4s, v1.4s
ret

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-aarch64

bcl5980 commented 2 years ago

Similar to #54226 ?