tensorflow / mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure
1.74k stars 260 forks source link

Vector lowering error (--vector-lower-to-llvm-dialect) #189

Open tetuante opened 5 years ago

tetuante commented 5 years ago

Hi. I have found a couple of problems when trying to lower from Vector to LLVM using --vector-lower-to-llvm. Linalg and Affine dialects are also used. I attach two example input files where I'm experiencing the problems

First (vector_loop_error.mlir in the attachments) I use transfer_read inside a loop nest (loop.for operations). When using --vector-lower-to-llvm I get the following error:

vector_loop_error.mlir:14:5: error: 'loop.for' op operand #0 must be index, but got '!llvm.i64' loop.for %arg3 = %c0 to %9 step %c8 {

which I don't get if I remove the "transfer_read" and try to lower it with --linalg-convert-to-llvm

The second error (outerprod_error.mlir in the attachments) happens when using vector.outerproduct. In this case. when using --vector-lower-to-llvm-dialect it completley crashes (core dumped) after an assertion:

mlir-opt: /home/...../MLIR/llvm-project/llvm/projects/mlir/include/mlir/IR/Types.h:277: U mlir::Type::cast() const [with U = mlir::LLVM::LLVMType]: Assertion `isa()' failed.

I'm rather new with MLIR so I may be doing something wrong, but I tried different ways and cannot avoid those two errors.

Thanks a lot, Nacho

vector-errors.zip

nicolasvasilache commented 5 years ago

Hello @tetuante

The first case fails because --vector-lower-to-llvm does not know about affine, loops or linalg. Since this is a partial conversion, ops at those level remain unconverted but end up having operands that are converted and the IR is invalid (hence the error message). Generally the higher-level dialects include patterns for multiple dialects to ensure A+B+C -> Z works.

So if you use linalg ops you should use the linalg lowering pass.

However in this particular case, you are using vector transfer ops. Work on these predate the conversion infrastructure and have not been updated accordingly. A few things need to be updated before your examples pass.

I started looking at those and will give an update by tomorrow eve. However I am going away for 1 week, so hopefully things are resolved by tomorrow eve.

Thanks!

nicolasvasilache commented 5 years ago

Hello @tetuante

I have a local version of the following test passing locally:

// Test that we can lower all the way to LLVM without crashing, don't check results here.
// RUN: mlir-opt %s -affine-lower-vector-transfers -convert-linalg-to-llvm -o=/dev/null 2>&1

#map0 = (d0) -> (d0 + 8)
#map1 = (d0, d1)[s0, s1] -> (d0 * s1 + s0 + d1)
#map2 = (d0) -> (d0 + 16)

func @transfer_1(%arg0: index, %arg1: index) {
  %c0 = constant 0 : index
  %c8 = constant 8 : index
  %c1 = constant 1 : index
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %9 = dim %0, 0 : memref<?x?xf32>
  %12 = dim %0, 1 : memref<?x?xf32>
  loop.for %arg3 = %c0 to %9 step %c8 {
    loop.for %arg4 = %c0 to %12 step %c8 {
        %21 = affine.apply #map0(%arg3)
        %22 = affine.apply #map0(%arg4)
        %23 = linalg.subview %0[%arg3, %21, %c1, %arg4, %22, %c1] : memref<?x?xf32>
        %vC = vector.transfer_read %23[%arg3, %arg4] {permutation_map = (d0, d1)->(d0,d1)} : memref<?x?xf32, #map1>, vector<8x8xf32>
    }
  }
  return
}

func @matmul(%arg0: index, %arg1: index, %arg2: index) {
  %c16 = constant 16 : index
  %c1 = constant 1 : index
  %c0 = constant 0 : index
  %c8 = constant 8 : index
  %cst = constant 0.000000e+00 : f32
  %cst_0 = constant 1.000000e+00 : f32
  %cst_1 = constant 2.000000e+00 : f32
  %0 = alloc(%arg0, %arg1) : memref<?x?xf32>
  %1 = alloc(%arg1, %arg2) : memref<?x?xf32>

  %3 = dim %0, 0 : memref<?x?xf32>
  %4 = dim %0, 1 : memref<?x?xf32>
  %9 = dim %0, 0 : memref<?x?xf32>
  %10 = dim %0, 1 : memref<?x?xf32>
  %11 = dim %1, 0 : memref<?x?xf32>
  %12 = dim %1, 1 : memref<?x?xf32>
  loop.for %arg3 = %c0 to %9 step %c8 {
    loop.for %arg4 = %c0 to %12 step %c8 {
      loop.for %arg5 = %c0 to %10 step %c16 {
        %15 = affine.apply #map0(%arg3)
        %16 = affine.apply #map2(%arg5)
        %17 = linalg.subview %0[%arg3, %15, %c1, %arg5, %16, %c1] : memref<?x?xf32>
        %18 = affine.apply #map2(%arg5)
        %19 = affine.apply #map0(%arg4)
        %20 = linalg.subview %1[%arg5, %18, %c1, %arg4, %19, %c1] : memref<?x?xf32>
        loop.for %arg6 = %c0 to %c16 step %c1   {
          %idx1 = addi %arg5, %arg6 : index
          %vA = vector.transfer_read %17[%arg3, %idx1] {permutation_map = (d0, d1)->(d1)} : memref<?x?xf32, #map1>, vector<8xf32>
          %vB = vector.transfer_read %20[%idx1, %arg4] {permutation_map = (d0, d1)->(d0)} : memref<?x?xf32, #map1>, vector<8xf32>
          %vc = vector.outerproduct %vA,%vB : vector<8xf32>, vector<8xf32>
        }
      }
    }
  }
  return
}

However it currently needs the following change to AffineOps.cpp, that breaks other things and requires a longer discussion:

lib/Dialect/AffineOps/AffineOps.cpp
247,250c247,250
<   // // Verify that the operands are valid dimension and symbol identifiers.
<   // if (failed(verifyDimAndSymbolIdentifiers(*this, getOperands(),
<   //                                          map.getNumDims())))
<   //   return failure();
---
>   // Verify that the operands are valid dimension and symbol identifiers.
>   if (failed(verifyDimAndSymbolIdentifiers(*this, getOperands(),
>                                            map.getNumDims())))
>     return failure();
263,265c263,264
<   return true;
<   // return llvm::all_of(getOperands(),
<   //                     [](Value *op) { return mlir::isValidDim(op); });
---
>   return llvm::all_of(getOperands(),
>                       [](Value *op) { return mlir::isValidDim(op); });
271,273c270,271
<   return true;
<   // return llvm::all_of(getOperands(),
<   //                     [](Value *op) { return mlir::isValidSymbol(op); });
---
>   return llvm::all_of(getOperands(),
>                       [](Value *op) { return mlir::isValidSymbol(op); });

By applying the patch locally you should be able to use the mlir-opt command in the test above.

tetuante commented 5 years ago

Thank you very much for all the hints and sorry for my delaying in coming back to the issue. Indeed, --affine-lower-vector-transfers works for me in this context (with the patch)

Without the patch, there is still an issue when the memref (a subview in my example) is in a loop nest: if the vector_transfer uses a memref allocated outside the loop nest, everything works. But, if the allocation happens somewhere inside the loopnest (or if I use a subview to get another memref), there is an error similar to:

outerprod_error.mlir:35:10: error: 'affine.apply' op operand cannot be used as a dimension id %vA = vector.transfer_read %17[%arg3, %idx1] {permutation_map = (d0, d1)->(d1)} : memref<?x?xf32, #map1>, vector<8xf32>

(in this error, %17 is the output of a "subview" used inside the loop nest).

I am debugging it to try to get some hints on it

Thanks!