tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
399 stars 50 forks source link

Add finer grain control over MeshDevices mapped onto same MMIO chip #12955

Open cfjchu opened 3 hours ago

cfjchu commented 3 hours ago

Current Context

Let's suppose we have an 8x4 galaxy grid. User requests 1x4:

2x4 devices are opened because we currently always open devices along both tunnels of an MMIO chip

[39,38,37,40]
[34,35,36,41]

and [39,38,37,40] is assigned back to the user. When a user closes its assigned 1x4 mesh, we close the 2x4 opened devices.

Unsupported

Let's suppose we have an 8x4 galaxy grid. User requests two 1x4 grid of devices: mesh_device_0 = MeshDevice(MeshShape(1,4)) -> assigned: [39,38,37,40] mesh_device_1 = MeshDevice(MeshShape(1,4), offset=(1,0)) -> assigned: [34,35,36,41]

When we create mesh_device_0, we will open 2x4 but only assign 1x4. When we create mesh_device_1, we do not open any devices but just assign the devices. When try to close devices, there are some problems. The closing semantics don't currently allow us to close: [39,38,37,40] and while keeping [34,35,36,41] alive.

image

cfjchu commented 3 hours ago

fyi @davorchap