tenstorrent / tt-forge-fe

The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their performance and efficiency.
https://docs.tenstorrent.com/tt-forge-fe/
Apache License 2.0
10 stars 1 forks source link

[Core] Remove stack/slice ops from the model #273

Open nvukobratTT opened 3 weeks ago

nvukobratTT commented 3 weeks ago

Occurring in self-attention, mostly around stack/slice op decompositions:

image
dgolubovicTT commented 2 weeks ago

We get this many hslices after decompose optimized graph pass. Before that it is only one hslice which is a part of the initial graph:

image

@nvukobratTT Do we want to remove hslice alltogether or we just want to investigate why do we get so much hslices after decomposed optimized graph?

nvukobratTT commented 2 weeks ago

We can remove hslice altogether, no need to investigate further why we're getting more of them after decompose optimized stage.

In sum, goal is to remove H&V stack/slice ops from the FFE. For this issue, we can focus on removing them for the Llama 3B model only :))

dgolubovicTT commented 1 week ago

This effort is currently blocked by metal issue: https://github.com/tenstorrent/tt-metal/issues/13005#issuecomment-2370314090

Currently, transpose on forge is decomposed to bunch of ops (2 x pad_tile, vstack,vslice, 2 x narrow). This is remaining of pybuda, and pad_tile, vslice and vstack are not supported in TTIR (because they don't exist as concepts in ttnn which is expected...). Untill we get support from metal for transpose, LLama effort is blocked.