llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.96k stars 11.94k forks source link

Argument limit for avx2_gather is too weak #98436

Open KanRobert opened 3 months ago

KanRobert commented 3 months ago

https://www.godbolt.org/z/9T1eGPWjY

llvm/include/llvm/IR/IntrinsicsX86.td

  def int_x86_avx2_gather_q_q_256 : ClangBuiltin<"__builtin_ia32_gatherq_q256">,
      DefaultAttrsIntrinsic<[llvm_v4i64_ty],
        [llvm_v4i64_ty, llvm_ptr_ty, llvm_v4i64_ty, llvm_v4i64_ty, llvm_i8_ty],
        [IntrReadMem, ImmArg<ArgIndex<4>>]>;

The i8 operand is a scale, and should be 1, 2, 4 or 8. But now the middle end only knows it's 8-bit immediate. So llvm-reduce may assign 0 to the operand and then produce incorrect IR.

llvmbot commented 3 months ago

@llvm/issue-subscribers-backend-x86

Author: Shengchen Kan (KanRobert)

https://www.godbolt.org/z/9T1eGPWjY llvm/include/llvm/IR/IntrinsicsX86.td ``` def int_x86_avx2_gather_q_q_256 : ClangBuiltin<"__builtin_ia32_gatherq_q256">, DefaultAttrsIntrinsic<[llvm_v4i64_ty], [llvm_v4i64_ty, llvm_ptr_ty, llvm_v4i64_ty, llvm_v4i64_ty, llvm_i8_ty], [IntrReadMem, ImmArg<ArgIndex<4>>]>; ``` The i8 operand is a scale, and should be 1, 2, 4 or 8. But now the middle end only knows it's 8-bit immediate. So llvm-reduce may assign 0 to the operand and then produce incorrect IR.
KanRobert commented 3 months ago

CC @FreddyLeaf @RKSimon @phoebewang @topperc @e-kud @goldsteinn

FreddyLeaf commented 3 months ago

I noticed front end can throw error for wrong scale: https://gcc.godbolt.org/z/699YP5TTq

KanRobert commented 3 months ago

I noticed front end can throw error for wrong scale: https://gcc.godbolt.org/z/699YP5TTq

Yes. But I am talking about the middle-end and backend to make llvm-reduce work correctly.

autoquery commented 3 months ago

For target intrinsics, the front end check is enough. I think llvm-reduce should provide some options to control the reduction rule with target intrinsics.

KanRobert commented 3 months ago

For target intrinsics, the front end check is enough. I think llvm-reduce should provide some options to control the reduction rule with target intrinsics.

But that option would definitely degrade the ability of llvm-reduce to reduce the test. Could we have sth like ImmArg<ArgIndex<4>, [1, 2, 4, 8]>?

RKSimon commented 3 months ago

Why don't we just remove the x86_avx2_gather* intrinsics and auto upgrade them to a gep + masked.gather intrinsic?

autoquery commented 3 months ago

For target intrinsics, the front end check is enough. I think llvm-reduce should provide some options to control the reduction rule with target intrinsics.

But that option would definitely degrade the ability of llvm-reduce to reduce the test. Could we have sth like ImmArg<ArgIndex<4>, [1, 2, 4, 8]>?

Still, this is something should be improved in llvm-reduce instead of middle-end/backend.