TTNN Op Interface: L1 usage

rjakovljevicTT commented 1 month ago

What Provide a TTNN API to return L1 usage (sizes of L1 buffers and their start addresses) to the TT-MLIR optimizer. The goal of this issue is to do an end-to-end PoC and conclude what would be the proper solution. Based on the conclusion we will open a new issue to define the whole work.

PoC Initially, on an example of MNIST NN and all Ops that are parts of MNIST, do the following tasks:

[ ] Implement L1 usage API in TTNN (there might already be an issue on TTNN side to do this),
[ ] Consume the API in TT-MLIR Optimizer (todo: define with @nobradovictt what would be the exact scenario to test).
[ ] Conclude on the L1 usage API design in TTNN and define the issue to do the whole work on TTNN side.

The above tasks should be done Iteratively until it works well for both TT-MLIR Optimizer and TTNN.

derdeljanTT commented 1 week ago

For this sprint, we have agreed to implement support for L1 usage interface for MNIST ops - eltwise binary op, eltwise unary op (relu), softmax (basic implementation, no mask or scaling) and two implementations of matmul (1d and 2d multicast reuse). L1 interfaces for these ops are implemented and merged to the internal branch in tt-metal repo (feature/mlir-interface).

Currently interface can answer two questions - how many bytes will be allocated per core for circular buffers and how many bytes per core will be allocated.

TBD - Expose this interface through MLIR interface lib so it can be used by the optimiser.

mbezuljTT commented 4 days ago

@derdeljanTT has added a switch TTNN_MLIR_INTERFACE_USE_GRAPH_CAPTURE=1 that changes l1 interface to use graph capture.

To use it on the machine without silicon (and even if you have silicon, to get faster way) use following switches TT_NO_FIRMWARE=1 TT_METAL_MOCKUP_EN=1

This gets us to circa 1ms per query.

tenstorrent / tt-mlir

TTNN Op Interface: L1 usage #303