LLVM IR vector type support

numba / llvmlite

A lightweight LLVM python binding for writing JIT compilers

https://llvmlite.pydata.org/

BSD 2-Clause "Simplified" License

1.94k stars 319 forks source link

LLVM IR vector type support #211

Open eamartin opened 8 years ago

eamartin commented 8 years ago

Are there any plans to eventually support LLVM vector types?

I've not personally used LLVM vector types, but they seem like a useful abstraction to target SIMD instructions.

seibert commented 8 years ago

I don't think we had planned to add these types ourselves, as most of our llvmlite development is being driven by Numba needs. Right now we're relying on the autovectorization passes to convert scalars to vectors for us, which has obvious limitations.

seibert commented 8 years ago

I should say, if someone does want to contribute this to llvmlite, we would be interested.

sklam commented 8 years ago

I was looking at the masked vector intrinsics and thought about what is needed for adding vector types. I will write down some notes here:

a VectorType can be easily made by copying ArrayType
need insertelement and extractelement instructions (basically a copy of insertvalue and extractvalue)
need to extend gep for vector-of-pointers (see: http://blog.llvm.org/2011/12/llvm-31-vector-changes.html)

For numba:

should test support for these intrinsics http://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave and see if they allow us to avoid/delay doing manual vectorization.
look at http://blog.llvm.org/2014/11/loop-vectorization-diagnostics-and.html for vectorization control+diagnostic

maedoc commented 7 years ago

What would this look like on the Numba side? The easiest thing I could think of (or at least what I'd like) is that an inner loop of SIMD size (2, 4, 8 etc) without any fancy control flow could be marked as appropriate for vectorization, perhaps similar to prange?

seibert commented 7 years ago

In simple cases, Numba already benefits from LLVM autovectorization passes. We're working with Intel to enable LLVM to use SVML for SIMD vector math functions (when SVML is available) in the autovectorizer. Explicitly doing SIMD vector operations at the Python level is likely to be pretty clunky, so we're mostly interested in making sure the autovectorizer in LLVM can do as much as possible. (And this doesn't require the introduction of SIMD vector intrinisics in llvmlite.)

maedoc commented 7 years ago

I asked on this issue, because autovec seems to currently work well from Clang but not Numba, e.g.

@numba.jit
def loop(a, b, c, out):
    rec_b = float32(b / 10.0)
    rec_c = float32(c * b / 42.0)
    for i in range(1000):
        for j in range(8):
            out[i, j] = a + i * rec_b + i * rec_c

void loop(float a, float b, float c, float *out)
{
  float rec_b = b / 10.0;
  float rec_c = c * b / 42.0;
  for (int i=0; i<1000; i++)
    for (int j=0; j<8; j++)
      out[i*8 + j] = a + i * rec_b + i * rec_c;
}

In the former, the optimized IR is using regular floats, while in the latter, the IR shows work done on <8 x float>s (using the -march=core-avx2 flag). Which is why I jumped on this issue, since I was guessing that this difference is actually in the Clang frontend and not the LLVM autovectorization passes, so if these can be expressed by Numba, vectorization would be easier to guarantee, instead of hoping that it's done automatically.

seibert commented 7 years ago

We have found in the past that subtle differences in the LLVM IR can result in LLVM optimization passes working or not. Since the Clang developers know these tricks, we frequently will inspect the LLVM IR output from Clang to learn about undocumented or underdocumented features.

Can you open a Numba issue with the example you listed above? We should see if we can copy what Clang is doing.