Open nvukobratTT opened 3 weeks ago
We get this many hslices after decompose optimized graph pass. Before that it is only one hslice which is a part of the initial graph:
@nvukobratTT Do we want to remove hslice alltogether or we just want to investigate why do we get so much hslices after decomposed optimized graph?
We can remove hslice altogether, no need to investigate further why we're getting more of them after decompose optimized stage.
In sum, goal is to remove H&V stack/slice ops from the FFE. For this issue, we can focus on removing them for the Llama 3B model only :))
This effort is currently blocked by metal issue: https://github.com/tenstorrent/tt-metal/issues/13005#issuecomment-2370314090
Currently, transpose on forge is decomposed to bunch of ops (2 x pad_tile, vstack,vslice, 2 x narrow). This is remaining of pybuda, and pad_tile, vslice and vstack are not supported in TTIR (because they don't exist as concepts in ttnn which is expected...). Untill we get support from metal for transpose, LLama effort is blocked.
Occurring in self-attention, mostly around stack/slice op decompositions: