MNIST Model E2E Multicore

nsmithtt commented 4 months ago

Here is the pytest we need to run:

class MNIST(nn.Module):
    def __init__(self, input_size=784, output_size=10, hidden_size=256):
        super(MNIST, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.l2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.l1(x)
        x = self.relu(x)
        x = self.l2(x)
        return nn.functional.softmax(x)

def test_mnist():                             
    batch = 1
    input_img = torch.randn(batch, 28*28)
    mnist = MNIST()         
    mnist.to('tt')         
    pybuda_mod = torch.compile(mnist, backend=compile_torch, dynamic=False)
    result = pybuda_mod(input_img.to("tt"))
    print("result", result)

Unique Ops:

matmul
add
relu
softmax

Milestone 1 Goals:

Run every op on multicore
- Optimization pass functionally chooses correct multicore
Demonstrate Grid Shape Override via compiler visualizer
Push training graph through

odjuricicTT commented 2 months ago

In order to run this on multicore #450 needs to be solved.

odjuricicTT commented 3 weeks ago

Update: We currently run mnist width sharded on multiple cores by using overrides.

What's left:

In order to verify this with device trace we're still waiting on #828
For optimization to work without overrides we need #911
TT-explorer has a POC with hardcoded overrides, but needs further integration. TDB, related: https://github.com/tenstorrent/tt-mlir/issues/913

tenstorrent / tt-mlir

MNIST Model E2E Multicore #76