[Metal Direct] Testing - Githubissues

nsmithtt commented 2 months ago

We want a testing harness in python, we want to be able to write:

import torch
from ttmlir.dialects import ttir

# What the test harness could look like

class Add:
    def golden(self, a, b):
        return a + b

    def build(self, a, b):
        return ttir.add(a, b)

class TTIRBuilder:
    def create_ttir_tensor(...):
    def create_ttir_add(...):

# What a test definition could look like
def test_add(builder):
    torch.seed(0)
    in0 = torch.randn(64, 128)
    torch.seed(1)
    in1 = torch.randn(64, 128)
    golden = in0 + in1
    ttir_tensor0 = builder.create_ttir_tensor(in0)
    ttir_tensor1 = builder.create_ttir_tensor(in1)
    out = builder.create_ttir_add(ttir_tensor0, ttir_tensor1)
    builder.finish(input_seeds=[0, 1], golden_outputs=[golden])

This would build an MLIR graph in TTIR, then lower it to ttmetal dialect and then serialize to flatbuffer. It will also embed the golden information directly in the flatbuffer.

TTRT will then be able to pop open the embedded golden info, regen the same inputs using the same seed, and compare the embedded golden output with the run from the device. Sync with Taps regarding this golden support in TTRT which doesn't exist yet.

Reference:

test/python/tensor_layout.py this test already demonstrates creating MLIR from python
test/python/simple_kernel.py super prototype of writing a kernel in python and translating it to mlir, this test is fairly deprecated at this point and we'll almost certainly want to remove/change it. But it does serve as a good reference building an MLIR graph in python

In the short term we should just bake all golden information directly into the flatbuffer in the debug_info.fbs area. We can have a contract with ttrt that knows to look for golden info in that area and automatically does golden comparison / populates input data from there if exists.

debug_info.fbs:

table GoldenTensorDataBytes {
  data: [uint8];
};

table GoldenTensorDataSeed {
  seed: uint64;
};

table GoldenTensorDataURL {
  url: string; // local path or URL
};

union GoldenTensorData {
  GoldenTensorDataBytes,
  GoldenTensorDataSeed, // placeholder for future, we can store just the seed for random generated inputs instead of storing the full tensor data inline
  GoldenTensorDataURL, // placeholder for future, where we want to point to real weights / inputs
};

table GoldenTensor {
  ref: TensorRef; // Reference to tensor in the program that this golden corresponds to, shape info can be inferred from here too
  data: GoldenTensorData;
};

table GoldenInfo {
   golden_tensors: [GoldenTensor];
};

table DebugInfo {
  ...
  golden_info: GoldenInfo;
}

nsmithtt commented 1 month ago

Super minimal boilerplate I started here nsmith/tt-metal-test2 commit feel free to use / not.

kmitrovicTT commented 1 week ago

Assigning this umbrella issue to @tapspatel since he continued this effort.

tenstorrent / tt-mlir

[Metal Direct] Testing #537