Closed ayerofieiev-tt closed 1 week ago
@TT-BrianLiu , is this work you could pick up here?
Yea, I can take a look
I have scoped out the work required for moving these ops (see the updated description). I will likely start with tt_eager/tt_dnn/op_library/transformer_tms/transformer_tms.hpp
, then take it op by op for the ones in tt_eager/tt_dnn/op_library/nlp_tms/nlp_tms.hpp
. At the end, tt_eager/tt_dnn/op_library/transformer_tms/transformer_tms.hpp
will still exist since it contains other custom models ops that are completely unrelated but tt_eager/tt_dnn/op_library/transformer_tms/transformer_tms.hpp
should be completely gone.
I plan to move everything into ttnn::experimental::transformer
tt_eager --> ttnn per op plan
We propose next order to breakdown this work into smaller pieces:
Replacing usage in C++
Each of such usage should be replaced with a ttnn analog. For example
repeat
-->ttnn::repeat
. This should be done for each operation. Missing operations should be added to ttnn.Replacing usage in Python
For every unary op, look for next entries in Tests/Sweeps, Demos, Models, Examples:
ttl.tensor.repeat
tt_lib.tensor.repeat
ttnn.primary.tensor.repeat
and replace them withttnn.repeat
. Example⚠️ tt_lib operations might sometimes have a slightly different interface
Testing
For the best coverage, I recommend to run these workflows. If some of them fails, check if it is the same fail as on main:
Scope of Work for Transformers
There are two sets of files to move which contain two sets of ops: any variation of split qkv heads and concat heads.
tt_eager/tt_dnn/op_library/transformer_tms/transformer_tms.hpp
operations.primary.transformers.split_query_key_value_and_split_heads
used in:tt::operations::primary::transformers::split_query_key_value_and_split_heads
)operations.primary.transformers.concatenate_heads
used in:tt::operations::primary::transformers::concatenate_heads
)tt_eager/tt_dnn/op_library/nlp_tms/nlp_tms.hpp
tensor.create_qkv_heads
used in:tt_metal::create_qkv_heads
)tensor.create_qkv_heads_from_separate_tensors
used in:tensor.nlp_create_qkv_heads_falcon7b
used in:tt_metal::nlp_create_qkv_heads_falcon_7b
)tensor.nlp_create_qkv_heads_decode
used in:tensor.nlp_create_qkv_heads
used in:tt_metal::nlp_create_qkv_heads
)tensor.nlp_concat_heads
used in:tensor.nlp_concat_heads_decode
used in:tensor.nlp_kv_cache_load_slice
used in (this is a new op (?) but I will move it for completeness):