tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
339 stars 33 forks source link

Merge ttnn with tt_eager #9322

Open eyonland opened 1 month ago

eyonland commented 1 month ago

Goal

Streamline further development of our OP library.

Background

Operations are split between tt_eager and ttnn. Many ops in ttnn call their tt_eager implementation. Some ops like unary and binary duplicate lots of code leading to 2 different pathways in C++ / Python. This not only confuses new contributors, but increases time required to make changes. For example, as we add support for queue_id and output_tensor - we have to add them to ops in both libs.

We want to merge the two, moving code from tt_eager to ttnn, consolidating OPs in one place.

Guiding principles

Milestones

Example instead of 1000 words

See how files are consolidated in ttnn, instead of being split between tt_eager and ttnn. This is the core of the change this effort drives.

From šŸ˜«

tt_eager
ā””ā”€ā”€ tt_dnn
    ā””ā”€ā”€ op_library
        ā”œā”€ā”€ ...
        ā””ā”€ā”€ eltwise_binary
            ā”œā”€ā”€ kernels
            ā”‚   ā”œā”€ā”€ compute
            ā”‚   ā”‚   ā””ā”€ā”€ eltwise_binary.cpp
            ā”‚   ā””ā”€ā”€ dataflow
            ā”‚       ā””ā”€ā”€ reader_binary_interleaved_start_id.cpp
            ā”œā”€ā”€ multi_core
            ā”‚   ā””ā”€ā”€ eltwise_binary_op_multi_core.cpp
            ā”œā”€ā”€ eltwise_binary_op.cpp
            ā””ā”€ā”€ eltwise_binary_op.hpp
ttnn
ā””ā”€ā”€ cpp
    ā”œā”€ā”€ pybind11
    ā”‚   ā””ā”€ā”€ operations
    ā”‚       ā”œā”€ā”€ ...
    ā”‚       ā””ā”€ā”€ binary.hpp
    ā””ā”€ā”€ ttnn
        ā”œā”€ā”€ op_library
        ā”‚   ā””ā”€ā”€ ...
        ā”‚   ā””ā”€ā”€ binary
        ā”‚       ā”œā”€ā”€ binary_op.hpp
        ā”‚       ā””ā”€ā”€ binary_op.cpp
        ā””ā”€ā”€ operations
            ā”œā”€ā”€ ...
            ā””ā”€ā”€ binary.hpp

to šŸ˜Ž

ttnn
ā””ā”€ā”€ cpp
    ā””ā”€ā”€ ttnn
        ā””ā”€ā”€ operations
            ā””ā”€ā”€ ...
            ā””ā”€ā”€ eltwise
                ā””ā”€ā”€ binary
                    ā”œā”€ā”€ device
                    ā”‚   ā”œā”€ā”€ kernels
                    ā”‚   ā”‚   ā”œā”€ā”€ compute
                    ā”‚   ā”‚   ā”‚   ā””ā”€ā”€ eltwise_binary.cpp
                    ā”‚   ā”‚   ā””ā”€ā”€ dataflow
                    ā”‚   ā”‚       ā””ā”€ā”€ reader_binary_interleaved_start_id.cpp
                    ā”‚   ā”œā”€ā”€ binary_op.cpp
                    ā”‚   ā”œā”€ā”€ binary_op.hpp
                    ā”‚   ā”œā”€ā”€ binary_program_factory.cpp
                    ā”‚   ā””ā”€ā”€ binary_program_factory.hpp
                    ā”œā”€ā”€ binary.cpp
                    ā”œā”€ā”€ binary.hpp
                    ā”œā”€ā”€ binary_pybind.cpp
                    ā””ā”€ā”€ binary_pybind.hpp

All files related to an Operation are in its own folder. Binary op gets its own folder and owner.

Host and Device "programs" are split. Host program can become a "gold" for comparison in tests.

Scope (tbd)

might want to prioritize migration of these:

ttnn/ttnn/api.rst

ayerofieiev-tt commented 1 month ago

Considerations

During the discussion about the structure we considered:

1. A) Group bindings vs B) Group everything related to a given operation

2. A) Related operations split into own files vs B) Related operations in the same file

SeanNijjar commented 3 weeks ago

Question in terms of pipeline setup.

We've currently got tests invoked in run_tt_eager.py and I'm assuming we'll keep the test invocations in there but update the test file paths (as they are migrated) and finish the migration process by simply renaming run_tt_eager.py

yan-zaretskiy commented 2 weeks ago

As I am porting my first op, I realize that I take issue with the <op>_program_factory.hpp naming convention. We don't write a factory here. A factory has a very specific meaning, which is an object that creates other objects. Here we create a file that defines programs themselves. So it should be renamed to <op>_programs.hpp to reflect what is actually inside.