Fix InitListExpr for OpenCL vectors

seven-mile commented 1 month ago

Current CIRGen may emit %vi4res = cir.vec.create(..., %vi2a, %vi2b) for the source OpenCL code vi4 vi4res = (vi4)(vi2a, vi2b), and end up with "inserting elements typed vi2 into a vector typed vi4 in LLVM IR".

The corresponding implementation from OG CodeGen is here. It uses shuffle operations to extend two vectors and merge the effective elements into the final result.

We can make it CIRGen or Lowering (keep the cir.vec.create(%vi2a, %vi2b) in CIR, rather than emitting shuffles immediately). I prefer CIRGen still.

Related to PR #613 . Suggested test case:


typedef int vi4 __attribute__((ext_vector_type(4)));
typedef int vi2 __attribute__((ext_vector_type(2)));

__kernel void func(void) {
  vi2 a = {1, 2}, b = {3, 4};

  vi4 res = (vi4)(a, b);

  a = (vi2){ res.xy };
}

bcardosolopes commented 1 month ago

Current CIRGen may emit %vi4res = cir.vec.create(..., %vi2a, %vi2b) for the source OpenCL code vi4 vi4res = (vi4)(vi2a, vi2b), and end up with "inserting elements typed vi2 into a vector typed vi4 in LLVM IR".

I don't remember offhand. Does this seems like something done by design (i.e. we already have testcases for this) or is it something we forgot to verify?

Looking at VecCreateOp::verify my impression is that this isn't supported, didn't you get verification errors?

The corresponding implementation from OG CodeGen is here. It uses shuffle operations to extend two vectors and merge the effective elements into the final result.

We can make it CIRGen or Lowering (keep the cir.vec.create(%vi2a, %vi2b) in CIR, rather than emitting shuffles immediately). I prefer CIRGen still.

Whatever we decide to do on CIRGen, we need to make sure that the corresponding LLVM lowering should match what OG codegen does (in this case it shall be series of shuffles). However, if we could do better in CIRGen to map the semantics in a more clear way, we should do it - if we emit shuffles in CIRGen we make it potentially harder to retrieve original information, because we need to look into the shuffle and recognize it's just joining two smaller vectores.

I'd prefer avoiding shuffles this early for this, but if it's something we are already doing, then it wouldn't be inconsistent (and we can later improve by adding other ops). I'd also be fine with improving cir.vec.create to support the "building from smaller vectors" scenary. Another option would be to introduce operations for extending number of lanes and use that result to build the vectors, but not sure how well that feds into cir.vec.create later.

@dkolsen-pgi, suggestions on what do you think might play better here?

dkolsen-pgi commented 1 month ago

GNU vectors do not support concatenating two vectors with the syntax:

vi4 res = (vi4)(a, b);

So I haven't implemented that in CIR.

I think this is best implemented with cir.vec.shuffle rather than cir.vec.create. Concatenating two vectors is one of the things that shufflevector is designed to do.

bcardosolopes commented 1 month ago

Works for me, though a concat op would be cool too, but perhaps we could wait until we actually have a pass that'd prefer saving some compile time by not having to look at the mask to reconstruct the concat.

Note you'd still need an operation to extend these vectors before passing them to a shuffle as input. We could probably use some form of cast for that.

dkolsen-pgi commented 1 month ago

Note you'd still need an operation to extend these vectors before passing them to a shuffle as input.

That's not necessary. The result vector can have a different size than the two input vectors.

llvm / clangir

Fix InitListExpr for OpenCL vectors #615