numba / llvmlite

A lightweight LLVM python binding for writing JIT compilers
https://llvmlite.pydata.org/
BSD 2-Clause "Simplified" License
1.87k stars 317 forks source link

Enable `interleave` for Loop Optimization #1054

Open dlee992 opened 1 month ago

dlee992 commented 1 month ago

Looks like llvmlite doesn't support enabling interleave for Loop optimization for now.

In clang, it can support this style:

#pragma clang loop vectorize(enable)
#pragma clang loop interleave(enable)
for(...) {
  ...
}

more details can be found: https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

I saw it can get some benefits in one discussion: https://discourse.llvm.org/t/external-vectorizer-vplan/73634

nearly 2X for matrix multiplication program (on specifying the interleave count as 2

And when debugging my code with this debug setting, as numba doc suggested:

import llvmlite.binding as llvm
llvm.set_option("", "--debug-only=loop-vectorize,iv-descriptors")

I did see some logs like:

LV: Loop hints: force=? width=0 interleave=0
LV: Interleaving disabled by the pass manager

BTW, sometimes, I also saw LV: Loop hints: force=? width=0 interleave=1, I didn't find a source code location that numba or llvmlite sets it to 1, not sure where 1 comes from. I guess LLVM will choose between 0 or 1 internally?

After discussions with numba devs, if we agree it's good, I can contribute a PR to support this feature request.

tag @gmarkall

dlee992 commented 1 month ago

found another llvm issue related to this: https://github.com/llvm/llvm-project/issues/47011.

This makes me a bit confused. If we didn't specify disable interleave, why the debug msg tells me it's disabled by pass manager? need to figure out. Perhaps it's related to optimization level. However, I did rerun my test case with setting NUMBA_OPT to max, the log still shows disabled.

sklam commented 1 month ago

Looks like PassManagerBuilder has a lot of new options: https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/Transforms/IPO/PassManagerBuilder.cpp#L150-L177 Update: This is LLVM10

sklam commented 1 month ago

hm... LLVM14 actual have it on by default: https://github.com/llvm/llvm-project/blob/f28c006a5895fc0e329fe15fead81e37457cb1d1/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp#L194-L223

dlee992 commented 1 month ago

Thanks for digging. After your digging, this is more interesting to me, since I did use LLVM 14.0.6 to locally build llvmlite, I strictly followed the llvm14 recipe provided by llvmlite. I suppose I have to provide a simple reproducer to prove myself.. Will try.