Open svuckovicTT opened 4 hours ago
Paging @nobradovictt for optimizer, @nsmithtt for experience with tt metal
Any thoughts on this guys? For generality purposes, we could try with copying tensors for the consumers, but it does seem like a tricky problem even then, in terms of memory usage (e.g. what if multiple consumers require different layout properties, but both need the inputs to be in L1, sharded).
Today, in TTIR -> TTNN conversion path, we don't handle scenarios where a producer op has multiple consumers that expect different layouts (tile vs row_major). This might be true for other layout properties (sharded vs interleaved, device vs cpu, etc.).
Example: https://github.com/tenstorrent/tt-mlir/pull/863#discussion_r1793744974