nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
46 stars 23 forks source link

Enable initial executable linking + mlir-air bump #395

Closed nirvedhmeshram closed 1 month ago

nirvedhmeshram commented 1 month ago

Adds the pass to link multiple HAL executables together to the pass pipeline. The resulting linke3d executable can have several entry points. At artifact creation time we create artifact for each entry point one at a time by passing the corresponding aie.device op to the aie2xclbin tool. In a later PR we will merge the xclbins when possible. The executable schema is updated to account for flexibility needed to have shared xclbins / lx6 instruction streams. We also add logic to use assigned ordinals of entry points rather than assuming ascending order as noted by https://github.com/iree-org/iree/pull/15905

An mlir-air bump is needed for this to work e2e.

Progress towards: https://github.com/nod-ai/iree-amd-aie/issues/380

newling commented 1 month ago

Not sure what happened in CI, looks like an xclbin was created but then the cp failed because it couldn't see an xclbin

nirvedhmeshram commented 1 month ago

I don't have much context, but the codes generally LGTM. Is this working currently? It's good to add a test.

Yes I wanted to add a e2e test but couldnt find a easy way to do so with the current CI infrastructure here is a example matmul sequence I tested locally

!A_TYPE = tensor<32x32xf32>
!B_TYPE = tensor<32x32xf32>
!C_TYPE = tensor<32x32xf32>
!D_TYPE = tensor<32x16xf32>
func.func @two_mm(%lhs : !A_TYPE,
    %rhs : !B_TYPE, %rhs_2 : !D_TYPE) -> !D_TYPE {
  %empty = tensor.empty() : !C_TYPE
  %empty_2 = tensor.empty() : !D_TYPE
  %cst = arith.constant 0.0 : f32
  %fill = linalg.fill ins(%cst : f32) outs(%empty : !C_TYPE) -> !C_TYPE
  %fill_2 = linalg.fill ins(%cst : f32) outs(%empty_2 : !D_TYPE) -> !D_TYPE
  %2 = linalg.matmul ins(%lhs, %rhs : !A_TYPE, !B_TYPE)
      outs(%fill : !C_TYPE) -> !C_TYPE
  %3 = linalg.matmul ins(%2, %rhs_2 : !A_TYPE, !D_TYPE)
      outs(%fill_2 : !D_TYPE) -> !D_TYPE
  return %3 : !D_TYPE
}

The main blocker is that the signing process needs to account for the new directory structure that happens after the linking optimization. I am not too inclined to fix it becuase once we update the driver we wont have to sign anything so maybe we can add a test after that? In the mean time I have opened this issue https://github.com/nod-ai/iree-amd-aie/issues/397

nirvedhmeshram commented 1 month ago

Not sure what happened in CI, looks like an xclbin was created but then the cp failed because it couldn't see an xclbin

It was because the directory structure / file names are different with the link optimization, I changed that if there is only one dispatch. Lets see if that helps.

nirvedhmeshram commented 1 month ago

@newling could you PTAL at the new CI side changes?

nirvedhmeshram commented 1 month ago

If you remove build_tools/ci/rm_xclbin.sh can you please update the test cpu_comparison/run_test.sh too?

I already did? Is something missing in it?