llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.79k stars 10.98k forks source link

[mlir][Aarch64] Improve i8mm instruction sequence for `vector.contract` #90416

Open dcaballe opened 2 months ago

dcaballe commented 2 months ago

The i8mm lowering for some vector.contract ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get:

    1470: 6e180483      mov     v3.d[1], v4.d[0]                                                                                                                                                                           
    1474: 4e006204      tbl     v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b                                                                                                                                     
    1478: 4e84a462      smmla   v2.4s, v3.16b, v4.16b                                                                                                                                                                      
    147c: 6e024041      ext     v1.16b, v2.16b, v2.16b, #0x8 

It calls my attention the mov instruction, esp. the indexing from 1 to 0, the tbl and the ext instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of mov and ext instructions all over the place.

We should investigate what is going on and try to fix the problem. My suspicion is that this zero initialization and insertion for vecmat cases might be behind some of these instructions. We should try if using llvm.undef fixes part of the problem.

llvmbot commented 2 months ago

@llvm/issue-subscribers-mlir

Author: Diego Caballero (dcaballe)

The i8mm lowering for some `vector.contract` ops is currently functionally correct. However, performance wise there is some room for improvement. Looking at the generated asm for an mmt4d with 2x2x8 innermost tile sizes, we get: ``` 1470: 6e180483 mov v3.d[1], v4.d[0] 1474: 4e006204 tbl v4.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v0.16b 1478: 4e84a462 smmla v2.4s, v3.16b, v4.16b 147c: 6e024041 ext v1.16b, v2.16b, v2.16b, #0x8 ``` It calls my attention the `mov` instruction, esp. the indexing from `1` to `0`, the `tbl` and the `ext` instructions. This may not seem a big deal but the problem is really exacerbated when using larger tile sizes. We observed large sequences of `mov` and `ext` instructions all over the place. We should investigate what is going on and try to fix the problem. My suspicion is that this [zero initialization and insertion](https://github.com/llvm/llvm-project/blob/aafed3408e7269c42f974189198a47eb6dd2fc84/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp#L178-L185) for `vecmat` cases might be behind some of these instructions. We should try if using `llvm.undef` fixes part of the problem.