[MLIR] API Request: virtualized core coords

tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Apache License 2.0

458 stars 68 forks source link

[MLIR] API Request: virtualized core coords #11009

Open nsmithtt opened 3 months ago

nsmithtt commented 3 months ago

In order to have at least some portability of serialized traces #10702, it would be great to have metal runtime supply virtualized core coordinates that get loaded during device runtime. This would enable a trace captured on a harvested n300 to be replayable on another n300 which has the same logical core grid, but a different set of harvested rows.

One way this could work would be to have a special reserved section of local memory which holds a mapping that the get_noc_address dataflow API could index under the hood. This would for example translate logical core coord [3, 0] to physical coord 4-1.

pgkeller commented 3 months ago

this is the 2nd request for this feature in 24 hours. @sankarmanoj-tt noted that currently translation happens on the host and for convolution mcasts this info is passed through runtime args.

This shouldn't be hard to do. The table will be loaded/built w/ the firmware and accessible from the kernel. implementation will vary across GS/WH/BH. there will be some (minor) performance penalty for the lookup.

seems this issue should be tracked on the metal runtime board?

Is there a priority/timeframe for this? Currently have no resources available

nsmithtt commented 3 months ago

This isn't super urgent, just filed the issue for tracking. Our main use case is to be used in tandem with https://github.com/tenstorrent/tt-metal/issues/10702, i.e. to make traces more portable. I think we will be more interested in this over the course of 2-3 months down the road. Added to the metal runtime board, marked as P2 for us, @sankarmanoj-tt might have different prioritization.

davorchap commented 2 months ago

Virtualizing traces/portability is a great feature.

We can also consider using "virtual NOCs coord" (ie translated cords) , since we have HW support for these. Kernel wouldn't have to do logical -> physical translation at run-time. Passing virtual coords as compile-time args can also lead to more compile-time optimizations of kernel.

Is there any reason/preference to using run-time look up vs. virtual NOC coords (these were used in BUDA)?

davorchap commented 2 months ago

We need to also take DRAM and ETH into account, in addition to Tensix mesh. And potential harvesting of those in BH.

nsmithtt commented 2 months ago

Virtualizing traces/portability is a great feature.

We can also consider using "virtual NOCs coord" (ie translated cords) , since we have HW support for these. Kernel wouldn't have to do logical -> physical translation at run-time. Passing virtual coords as compile-time args can also lead to more compile-time optimizations of kernel.

If we have HW support for translating the core coords that would be great. So I could write get_noc_addr(0, 0, l1_offset); and under the hood the HW can translate this to physical coord 1-2?

Is there any reason/preference to using run-time look up vs. virtual NOC coords (these were used in BUDA)?

I think the preference for runtime lookup was just for trace portability since the riscv binaries are already compiled and embedded in trace. If we compile/trace on one n300 then we want to be able to move it to another n300 with different harvested rows and have the same trace run.

pgkeller commented 3 weeks ago

just had a discussion on this: plan at the moment is for metal to move to virtual (translated) coordinates and not expose physical coordinates through the API. dispatch would use virtual coordinates. if a program is compiled differently across devices for any reason (imagine passing the device id as a compile time argument), the trace could still be uniform but the kernel binary load would be different (and runtime would have to use a max_size at dispatch time to share the trace).

@tt-asaigal @cfjchu

nsmithtt commented 3 weeks ago

@pgkeller, are we going to use HW feature "NOC coordinate translation"?

pgkeller commented 3 weeks ago

yes