metal 1.0: revamp CoreCoord and logical/physical coordinate systems

pgkeller commented 2 months ago

Change CoreCoord logical/physical as follows:

Coordinates explicitly specify a core type
TBD: if explicit coord type is a part of the type or an enum contained by the type (probably the latter)
Each core type contains its own logical coordinates in a grid (0..n,0..m)
Active/Idle ethernet, for example, are separate coordinate types each with their own runtime-determined grid
(idle ethernet may not be exposed through the API)
Host API provides translation from logical space to "device" space, device space may be virtual or physical coordinates
Device API can translate from device space to physical coordinates within the API if needed
Open: should device coordinates be typed such that, eg, DPRINT converts to physical (or even logical) coordinates?
Expose a device side API to convert from logical space to device space, either an add for virtual or a table lookup for physical
Conversion table is provided by firmware. The table will be loaded to the device rather than compiled into the firmware (as it is presently), breaking all builds' dependence on harvesting

Pending: performance testing w/ physical/logical coords for kernels highly dependent on DRAM traffic

See also #5783

@abhullar-tt @aliuTT @tt-asaigal @yan-zaretskiy @yugaoTT @davorchap @pgkeller

pgkeller commented 2 months ago

Something to consider, code such as this: uint32_t CreateSemaphore( Program &program, const std::variant<CoreRange, CoreRangeSet> &core_spec, uint32_t initial_value, CoreType core_type) { with a CoreCoord containing an enum-type can mix and match types through the API. This may be ok for some APIs and not for others, we don't want to introduce a high burden of API validation

abhullar-tt commented 2 months ago

we currently denote dram channels as 0..n and then map to a single CoreCoord, but we may want to uplift this to expose multiple coordinates for a given dram channel

yugaoTT commented 2 months ago

this might be a tricky thing to test out, but I think in some cases the virtual coord might not work for dram sharded matmul (or other dram read critical ops). If we use virtual coords, then we will always have the mapping in the image. If the harvested rows are row 5 and 7, then the mapped cores (W2, W3, W10, W11, W8, W7) will be shifted down (second image). Then we will have multiple cores in one column, while dram bank is not next to it, and causing congestions.

pgkeller commented 2 months ago

@yugaoTT ok, took me a bit to process that, but I see the problem. And so you are saying that to get optimal placement in this scenario the host needs to understand the physical coordinates?

pgkeller commented 2 months ago

I had a thought last night, here's a proposal: 1) Use strongly typed coordinates for the API, eg, EthCoreCoord, TensixCoreCoord, etc (names up for debate) 2) APIs are templated for the different coordinate types 3) Internally set an enum in the base CoreCoord w/ the type info (either at creation time or when the API is called) 4) Internally use generic coordinates w/ type info

yugaoTT commented 2 months ago

@pgkeller yeah that's right.

pgkeller commented 2 months ago

@pgkeller yeah that's right.

I'm thinking we expose physical coords behind a "detail::" API. In other words, it isn't a first class API but provided as a semi-supported way of getting better perf, particularly since we don't think it'll be needed for BH and beyond

abhullar-tt commented 2 months ago

I had a thought last night, here's a proposal:

Use strongly typed coordinates for the API, eg, EthCoreCoord, TensixCoreCoord, etc (names up for debate)

APIs are templated for the different coordinate types

Internally set an enum in the base CoreCoord w/ the type info (either at creation time or when the API is called)

Internally use generic coordinates w/ type info

This makes sense to me, it doesn't seem like we would want a logical view of heterogeneous core types. I don't think we would support buffers being interleaved across different core types and wouldn't create kernels across different core types in one api

tenstorrent / tt-metal

metal 1.0: revamp CoreCoord and logical/physical coordinate systems #11624