tenstorrent / tt-mlir

Tenstorrent MLIR compiler
https://tenstorrent.github.io/tt-mlir/
Apache License 2.0
59 stars 7 forks source link

TTNN Op Interface: L1 usage #303

Open rjakovljevicTT opened 1 month ago

rjakovljevicTT commented 1 month ago

What Provide a TTNN API to return L1 usage (sizes of L1 buffers and their start addresses) to the TT-MLIR optimizer. The goal of this issue is to do an end-to-end PoC and conclude what would be the proper solution. Based on the conclusion we will open a new issue to define the whole work.

PoC Initially, on an example of MNIST NN and all Ops that are parts of MNIST, do the following tasks:

The above tasks should be done Iteratively until it works well for both TT-MLIR Optimizer and TTNN.

derdeljanTT commented 1 week ago

For this sprint, we have agreed to implement support for L1 usage interface for MNIST ops - eltwise binary op, eltwise unary op (relu), softmax (basic implementation, no mask or scaling) and two implementations of matmul (1d and 2d multicast reuse). L1 interfaces for these ops are implemented and merged to the internal branch in tt-metal repo (feature/mlir-interface).

Currently interface can answer two questions - how many bytes will be allocated per core for circular buffers and how many bytes per core will be allocated.

TBD - Expose this interface through MLIR interface lib so it can be used by the optimiser.

mbezuljTT commented 4 days ago

@derdeljanTT has added a switch TTNN_MLIR_INTERFACE_USE_GRAPH_CAPTURE=1 that changes l1 interface to use graph capture.

To use it on the machine without silicon (and even if you have silicon, to get faster way) use following switches TT_NO_FIRMWARE=1 TT_METAL_MOCKUP_EN=1

This gets us to circa 1ms per query.