llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.24k stars 12.07k forks source link

RVV instruction still generated even when auto vectorization are totally disabled #111642

Open fanghuaqi opened 1 month ago

fanghuaqi commented 1 month ago

Hello there, I met a strange issue when using LLVM/Clang to build c code with RISC-V Vector extension enabled, see test code here https://godbolt.org/z/GreE5bTWo

I am expecting when passing -O2 -march=rv64imafdc_zfh_zvfh_zve64f -mabi=lp64d -fno-vectorize -fno-slp-vectorize compiler options, auto vectorization will be totally disabled for RISC-V, but I can still see many RVV instructions generated such as vsetivli and vle64.

image

Is above behavior expected? If yes, If I want to fully disable any vector instruction generated, how to achieve it.

Thanks Huaqi

topperc commented 1 month ago

I suspect that's an inlined memcpy expansion which also uses RVV.

Try -mno-implicit-float

I can't remember if there is a specific disable for memcpy, but -mno-implicit-float should work.

fanghuaqi commented 1 month ago

I suspect that's an inlined memcpy expansion which also uses RVV.

Try -mno-implicit-float

I can't remember if there is a specific disable for memcpy, but -mno-implicit-float should work.

Thanks @topperc , it works for me. I checked the source code, see it is introduced in https://github.com/llvm/llvm-project/commit/e938217f8109 , will this -mno-implicit-float affect auto-vectorization if I just want to disable memcpy generate vector instructions, but still enable auto-vectorization?

And similar things are also done in gcc, see https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e6269bb69c0734a5af716bfbded3621de6ca351d , controlled by a separated option.

Thanks

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-risc-v

Author: Huaqi Fang (fanghuaqi)

Hello there, I met a strange issue when using LLVM/Clang to build c code with RISC-V Vector extension enabled, see test code here https://godbolt.org/z/GreE5bTWo I am expecting when passing `-O2 -march=rv64imafdc_zfh_zvfh_zve64f -mabi=lp64d -fno-vectorize -fno-slp-vectorize` compiler options, auto vectorization will be totally disabled for RISC-V, but I can still see many RVV instructions generated such as `vsetivli` and `vle64`. ![image](https://github.com/user-attachments/assets/bb00f5a5-c7b1-41d6-9f83-216494efe836) Is above behavior expected? If yes, If I want to fully disable any vector instruction generated, how to achieve it. Thanks Huaqi
lukel97 commented 1 month ago

Are you able to build without the vector extensions, i.e. -march=rv64imafdc_zfh? Or do you have a specific use case where you need it explicitly enabled but just without codegen?

preames commented 1 month ago

In general, vectorization is a particular optimization transform. -mno-implicit-float (as described above) disables various bits of lowering which implicitly uses float or vector registers. These are distinct options, and you may need both. In general, target vector enabled hardware but don't actually use vector is a bit of a weird use case, and to my knowledge, we don't have a master flag for this (other than leaving all vector extensions out of the march string.)

will this -mno-implicit-float affect auto-vectorization if I just want to disable memcpy generate vector instructions, but still enable auto-vectorization?

It's not clear what you're trying to say here. Are you looking for a way to experiment with memcpy expansion? Or is this something for production use?

fanghuaqi commented 1 month ago

Are you able to build without the vector extensions, i.e. -march=rv64imafdc_zfh? Or do you have a specific use case where you need it explicitly enabled but just without codegen?

Hi @lukel97 , sometimes, I just want to use rvv intrinsic API to write optimized code with rvv, and not willing to see auto vectorization code generated.

Thanks

fanghuaqi commented 1 month ago

Hi @preames, thanks for your explanation.

Here are the use cases from my site:

  1. Write optimize code using rvv intrinsic API to better match cpu micro-arch, and not enable any auto vectorization(include autovectorzation and memcpy/memset vector optimization)
  2. Use the auto vectorization feature, but disable memset/memcpy optimization(since sometime very small size memset/memcpy are optimized into rvv code, which is not efficient compared with scalar version)

Hope I have explained my cases clearly, thank you for your help.