Closed SriAlavandar closed 7 months ago
In older version of keras, they use tensordot to implement dense layers. For tensors with rank > 3, it will do the reshape and use MatMul, i.e. [a,b,c] -> reshape [a*b,c] -> matmul [a*b,d] -> reshape [a,b,d] -> bias -> activation. We use BMM so that we do not need the reshape op. It's also easier to deal with other post ops in future. I think current keras 3.0 also uses BMM.
I am running Hugging Face OPT 350M model with ITEX 2.14
As part of Intermediate block, when Relu/Gelu Fusion is being triggered I noticed that resultant op type is of FusedBatchMatMul instead of FusedMatMul (where postops would be Gelu Approximate/Exact/Relu).
Input Graph:
Resultant Graph:
Attaching the reference where Graph level Fusion is being taken place for this - https://github.com/intel/intel-extension-for-tensorflow/blob/d8fe3daa49f81767c1dd783325c330a145d945bd/itex/core/graph/remapper/remapper.cc#L912
Any specific reason why FusedBatchMatMul precedence is chosen over FusedMatMul?