Open ElEHsiang opened 6 months ago
Hi @dcaballe, do you interested on taking a look at it?
(Sorry, catching up after break)
I think the problem could be related to the fact that we are reducing the outer dimension here:
%7 = vector.contract {indexing_maps = [#map, #map, #map1], iterator_types = ["reduction", "parallel"], kind = #vector.kind<add , %6, %arg1 : vector<1x32xf32>, vector<1x32xf32> into vector<32xf32>
and the outer-product strategy expects a very specific "matmul-like" contraction. If the contraction doesn't align with that it will convert it to that "matmul-like" form by changing the layout of the inputs and we may end up vectorizing a dimension that we shouldn't (see scalar code).
I would need to think a bit more about this but perhaps we should use a different strategy only when the contraction op is not suitable to be represented with an outer product.
What happened?
I benchmark the layernorm based on
tests/e2e/regression/layernorm.mlir
but changed the input dimension to 1D. The codegen has a lotvslidedown
+vfmv
+fmadd
which impact the performance. It is caused by the lowering policy forvector.contract
inLLVMCPUVectorLoweringPass
. If I change theVectorContractLowering
fromOuterProduct
toParallelArith
, the codegen can simply usevfmadd
. And the speed is about 250% faster.I concerned that changing the policy directly will impact other test cases, any suggestions to optimize it?
The mlir I used.
The codegen has a lot of this pattern
These is caused by the lowering of
vector.contract
mlir after
LLVMCPUVectorLoweringPass
withVectorContractLowering::OuterProduct
mlir after
LLVMCPUVectorLoweringPass
withVectorContractLowering::ParallelArith
Steps to reproduce your issue
iree-compile command
What component(s) does this issue relate to?
MLIR
Version information
commit: 5cd1510a78e08ca16b8df2e3241a4c2d777ed653
Additional context
No response