Corresponding Aarch64 intrinsic for vmulq_u16

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.65k stars 11.84k forks source link

Corresponding Aarch64 intrinsic for vmulq_u16 #60306

Open Darshvino opened 1 year ago

Darshvino commented 1 year ago

Hi LLVM team,

I am trying to find a corresponding Aarch64 intrinsic for vmulq_u16 , but unfortunately, I am not able to find the matching LLVM aarch64 intrinsic for it. It would be really great if anyone can assist me in finding it. Below is the description of vmulq_u16:

uint16x8_t(output type) = vmulq_u16(uint16x8_t a, uint16x8_t b)

which also can be found here: https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vmulq_u16

Thanks and look forward to your reply!

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

davemgreen commented 1 year ago

vmulq_s16 should be available through arm_neon.h: https://godbolt.org/z/h43cPMfTK. If you mean an @llvm.aarch64 intrinsic for it, then you can just use a mul instruction directly.

Darshvino commented 1 year ago

Hi @davemgreen,

Thanks a lot for your reply.

I tried to use this intrinsic: "llvm.aarch64.neon.mul" but I was getting the below error:

AssertionError: llvm.aarch64.neon.mul is not an LLVM intrinsic I am not sure it is with respect to LLVM version of something.

It would be really great and helpful if you can help me to resolve the above error.

Thanks

davemgreen commented 1 year ago

I just mean an llvm mul instruction: %3 = mul <8 x i16> %1, %0 For the simple operation where the llvm instruction is equivalent to the neon intrinsic, we can just use the instruction directly and get the benefits of llvm being able to optimize them as it would any other mul.

Darshvino commented 1 year ago

Yeah got it @davemgreen,

But can we get something like this: llvm.aarch64.neon..... for a vmulq_u16 instead of the instruction directly?

I am a bit desperately looking for the above.

Darshvino commented 1 year ago

Hi @davemgreen,

look forward to your reply.

Thanks

davemgreen commented 1 year ago

There is no llvm.aarch64.neon equivalent for mul. It shouldn't be needed.

Perhaps taking a step back - what are you trying to do? Write llvm IR directly (in text form), or generating the instructions through the C/C++? Is this for some other frontend? If you are creating instructions with an IRBuilder you should be able to use CreateMul, for example.

Darshvino commented 1 year ago

Hi @davemgreen,

Thank you again for your reply.

Actually, I am working with TVM. I am trying to add a custom operator in TVM and it allows us to define an intrinsics (via Tensorize schedule) to use instead of leaving LLVM to directly generate the assembly code , here is one such example: https://github.com/apache/tvm/blob/f7dfef4cdea3a6ca96af7869e4457a4de0525eab/python/tvm/topi/arm_cpu/tensor_intrin.py#L101. And I think it allows only to use the Intrinsics instead of the instruction directly, but I had created an issue in TVM asking if we can use the instruction directly: https://github.com/apache/tvm/issues/13850