Closed Abhishek-Varma closed 1 month ago
Could you add an e2e test?
This is the first PR in the series of PRs that need to go in before I can add an e2e test for "Matmul + truncf" with the vectorization enabled for elementwise.
As discussed offline with @jtuyls - I have push most of the changes required in the same PR.
Will be raising a separate PR for flattening arith.truncf
and once that goes in I'll add an e2e test in this PR.
Will rename the PR title and description accordingly - so marking this as a draft.
Here's the short shaped Matmul + truncf e2e IR log that this PR currently enables (the bigger shapes need to be addressed incrementally who e2e IR, in case someone wants to take a look, is here) and the numerics were verified locally.
But for the e2e test via cpu_comparisons/run.py
I get the following for the CPU run itself let alone AIE :-
iree/runtime/src/iree/tooling/numpy_io.c:419:
UNIMPLEMENTED; unsupported data encoding; outputting results; processing function outputs;
`sync func @matmul_truncf(%input0: tensor<32x32xbf16>, %input1: tensor<32x32xbf16>) -> (%output0: tensor<32x32xbf16>)`
The above is not a verification issue at the MLIR level, else it'd have bailed out earlier.
I checked the input MLIR file that's getting generated locally but see no issue with it :-
// input 32x32xbf16
// input 32x32xbf16
func.func @matmul_truncf(%arg0: tensor<32x32xbf16>, %arg1: tensor<32x32xbf16>) -> tensor<32x32xbf16>
{
%cst = arith.constant 0.0 : f32
%0 = tensor.empty() : tensor<32x32xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<32x32xf32>) -> tensor<32x32xf32>
%2 = linalg.matmul ins(%arg0, %arg1 : tensor<32x32xbf16>, tensor<32x32xbf16>)
outs(%1: tensor<32x32xf32>) -> tensor<32x32xf32>
%3 = arith.truncf %2 : tensor<32x32xf32> to tensor<32x32xbf16>
return %3: tensor<32x32xbf16>
}
Am I missing some other template to include?
Here's the short shaped Matmul + truncf e2e IR log that this PR currently enables (the bigger shapes need to be addressed incrementally who e2e IR, in case someone wants to take a look, is here) and the numerics were verified locally.
But for the e2e test via
cpu_comparisons/run.py
I get the following for the CPU run itself let alone AIE :-iree/runtime/src/iree/tooling/numpy_io.c:419: UNIMPLEMENTED; unsupported data encoding; outputting results; processing function outputs; `sync func @matmul_truncf(%input0: tensor<32x32xbf16>, %input1: tensor<32x32xbf16>) -> (%output0: tensor<32x32xbf16>)`
The above is not a verification issue at the MLIR level, else it'd have bailed out earlier.
I checked the input MLIR file that's getting generated locally but see no issue with it :-
// input 32x32xbf16 // input 32x32xbf16 func.func @matmul_truncf(%arg0: tensor<32x32xbf16>, %arg1: tensor<32x32xbf16>) -> tensor<32x32xbf16> { %cst = arith.constant 0.0 : f32 %0 = tensor.empty() : tensor<32x32xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<32x32xf32>) -> tensor<32x32xf32> %2 = linalg.matmul ins(%arg0, %arg1 : tensor<32x32xbf16>, tensor<32x32xbf16>) outs(%1: tensor<32x32xf32>) -> tensor<32x32xf32> %3 = arith.truncf %2 : tensor<32x32xf32> to tensor<32x32xbf16> return %3: tensor<32x32xbf16> }
Am I missing some other template to include?
No this is all correct. I think the issue is that iree-run-module doesn't support writing bfloat16 values. I'll investigate further.
-- This commit includes
arith.truncf
,vector.transfer_read
andvector.transfer_write
intoamdaie.core
op. -- This is required to make "Matmul + truncf" work with vectorization enabled forarith.truncf
op.Signed-off-by: Abhishek Varma abhvarma@amd.com