failed to compile the optimized mlir file on ARM CPU

lethean1 commented 2 years ago

The original affine mlir is as follow:

module  {
  func @laplace(%arg0: memref<72x18x16xf64>, %arg1: memref<72x18x16xf64>) attributes {llvm.emit_c_interface} {
    %0 = memref.alloc() : memref<72x18x16xf64>
    affine.for %arg2 = 1 to 71 {
      affine.for %arg3 = 1 to 17 {
        affine.for %arg4 = 0 to 16 {
          %1 = affine.load %arg0[%arg2 - 1, %arg3, %arg4] : memref<72x18x16xf64>
          %2 = affine.load %arg0[%arg2 + 1, %arg3, %arg4] : memref<72x18x16xf64>
          %3 = affine.load %arg0[%arg2, %arg3 + 1, %arg4] : memref<72x18x16xf64>
          %4 = affine.load %arg0[%arg2, %arg3 - 1, %arg4] : memref<72x18x16xf64>
          %5 = affine.load %arg0[%arg2, %arg3, %arg4] : memref<72x18x16xf64>
          %6 = arith.addf %1, %2 : f64
          %7 = arith.addf %3, %4 : f64
          %8 = arith.addf %6, %7 : f64
          %cst = arith.constant -4.000000e+00 : f64
          %9 = arith.mulf %5, %cst : f64
          %10 = arith.addf %9, %8 : f64
          affine.store %10, %arg1[%arg2, %arg3, %arg4] : memref<72x18x16xf64>
        }
      }
    }
    return
  }
}

then I used your polyhedral optimization and got this :

#map0 = affine_map<(d0) -> (1, d0 * 32)>
#map1 = affine_map<(d0) -> (71, d0 * 32 + 32)>
module  {
  func private @S0(%arg0: memref<72x18x16xf64>, %arg1: index, %arg2: index, %arg3: index, %arg4: memref<72x18x16xf64>) attributes {scop.stmt} {
    %cst = arith.constant -4.000000e+00 : f64
    %0 = affine.load %arg4[symbol(%arg1), symbol(%arg2), symbol(%arg3)] : memref<72x18x16xf64>
    %1 = arith.mulf %0, %cst : f64
    %2 = affine.load %arg4[symbol(%arg1) - 1, symbol(%arg2), symbol(%arg3)] : memref<72x18x16xf64>
    %3 = affine.load %arg4[symbol(%arg1) + 1, symbol(%arg2), symbol(%arg3)] : memref<72x18x16xf64>
    %4 = arith.addf %2, %3 : f64
    %5 = affine.load %arg4[symbol(%arg1), symbol(%arg2) + 1, symbol(%arg3)] : memref<72x18x16xf64>
    %6 = affine.load %arg4[symbol(%arg1), symbol(%arg2) - 1, symbol(%arg3)] : memref<72x18x16xf64>
    %7 = arith.addf %5, %6 : f64
    %8 = arith.addf %4, %7 : f64
    %9 = arith.addf %1, %8 : f64
    affine.store %9, %arg0[symbol(%arg1), symbol(%arg2), symbol(%arg3)] : memref<72x18x16xf64>
    return
  }
  func @laplace(%arg0: memref<72x18x16xf64>, %arg1: memref<72x18x16xf64>) attributes {llvm.emit_c_interface} {
    affine.for %arg2 = 0 to 3 {
      affine.for %arg3 = max #map0(%arg2) to min #map1(%arg2) {
        affine.for %arg4 = 1 to 17 {
          affine.for %arg5 = 0 to 16 {
            call @S0(%arg1, %arg3, %arg4, %arg5, %arg0) : (memref<72x18x16xf64>, index, index, index, memref<72x18x16xf64>) -> ()
          }
        }
      }
    }
    return
  }
}

and then I lowered this file to llvm ir and compiled it

mlir-opt --lower-affine --convert-scf-to-std --convert-arith-to-llvm --convert-memref-to-llvm --convert-std-to-llvm='emit-c-wrappers=1' --reconcile-unrealized-casts affine_opt.mlir 
mlir-translate --mlir-to-llvmir laplace_lowered.mlir > laplace.bc
llc -O3 laplace.bc -o laplace.s

and got this error:

laplace.s:96:5: error: expected label or encodable integer pc offset
        bl      S0

And any help can compile this correctly is appreciated.

lethean1 commented 2 years ago

I think I know why. It could be because the func name "S0" conflicts with register name in ARM. And I advise you to change the prefix of the func name.

kumasento commented 2 years ago

Hi there, thanks for reporting this, and I'm glad you've found out the source.

If you need to change the prefix,

https://github.com/kumasento/polymer/blob/e87c27c36b3d346612e505a1b5d7939e6b6aeb41/lib/Transforms/ExtractScopStmt.cc#L275-L279

has an extra parameter for you to configure that.

You could expose this as a CLI argument if you prefer.

Best,

kumasento / polymer

failed to compile the optimized mlir file on ARM CPU #123