tenstorrent / tt-mlir

Tenstorrent MLIR compiler
https://tenstorrent.github.io/tt-mlir/
Apache License 2.0
72 stars 9 forks source link

Refactoring in TTMetal dialect #578

Closed rpavlovicTT closed 2 months ago

rpavlovicTT commented 2 months ago

This commit refactors:

  1. Dialect conversion from TTKernel to EmitC - fixes #317

  2. Serialization of TTMetal IR to flatbuffer binary - fixes #316

  3. Implement dialect conversion from TTKernel to EmitC TTKernel dialect that can be found nested in TTMetal ops can now be converted via 'convert-ttkernel-to-emitc' pass. Pass is registered as a func::FuncOp pass so the kernel must be put inside a function before conversion. When serializing ttmetal IR to binary, we call this conversion for every region of a ttmetal dispatch op.

FileCheck UT (ttkernel.mlir) is added. UT's output looks like:

module {
  func.func @ttkernel_noc() {
    %0 = "emitc.constant"() <{value = 262432 : i32}> : () -> i32
    %1 = "emitc.constant"() <{value = 262208 : i32}> : () -> i32
    %2 = "emitc.constant"() <{value = 32 : i32}> : () -> i32
    %3 = "emitc.constant"() <{value = 262400 : i32}> : () -> i32
    %4 = "emitc.constant"() <{value = 0 : i32}> : () -> i32
    %5 = "emitc.constant"() <{value = 262144 : i32}> : () -> i32
    %6 = emitc.call_opaque "get_noc_addr"(%4, %4, %5) : (i32, i32, i32) -> i64
    emitc.call_opaque "noc_async_read"(%6, %3, %2) : (i64, i32, i32) -> ()
    %7 = emitc.call_opaque "get_noc_addr"(%4, %4, %1) : (i32, i32, i32) -> i64
    emitc.call_opaque "noc_async_read"(%7, %0, %2) : (i64, i32, i32) -> ()
    emitc.call_opaque "noc_async_read_barrier"() : () -> ()
    return
  }
  func.func @ttkernel_tensix() {
    %0 = "emitc.variable"() <{value = #emitc.opaque<"::tt::CB::c_in0">}> : () -> !emitc.opaque<"::tt::CB">
    %1 = "emitc.variable"() <{value = #emitc.opaque<"::tt::CB::c_out0">}> : () -> !emitc.opaque<"::tt::CB">
    %2 = "emitc.constant"() <{value = 4 : i32}> : () -> i32
    emitc.call_opaque "untilize_init"(%0, %1) : (!emitc.opaque<"::tt::CB">, !emitc.opaque<"::tt::CB">) -> ()
    emitc.call_opaque "untilize_block"(%0, %2, %1) : (!emitc.opaque<"::tt::CB">, i32, !emitc.opaque<"::tt::CB">) -> ()
    emitc.call_opaque "cb_pop_front"(%0, %2) : (!emitc.opaque<"::tt::CB">, i32) -> ()
    emitc.call_opaque "cb_push_back"(%1, %2) : (!emitc.opaque<"::tt::CB">, i32) -> ()
    emitc.call_opaque "untilize_block"(%0, %2, %1) : (!emitc.opaque<"::tt::CB">, i32, !emitc.opaque<"::tt::CB">) -> ()
    emitc.call_opaque "cb_pop_front"(%0, %2) : (!emitc.opaque<"::tt::CB">, i32) -> ()
    emitc.call_opaque "cb_push_back"(%1, %2) : (!emitc.opaque<"::tt::CB">, i32) -> ()
    return
  }
}
  1. Translate TTMetal to flatbuffer

Serialization to flatbuffer binary is now a proper translation pass that can be run with:

ttmlir-translate --ttmetal-to-flatbuffer ttmetal.mlir

Example run:

./build/bin/ttmlir-opt --ttir-load-system-desc="path=ttrt-artifacts/system_desc.ttsys" --ttir-implicit-device --ttir-allocate --convert-ttir-to-ttmetal test/ttmlir/Silicon/TTMetal/to_layout.mlir | ./build/bin/ttmlir-translate --ttmetal-to-flatbuffer

One of dispatch op's kernel when translated to C++:

#include <cstdint>
#include "compute_kernel_api/common.h"
#include "compute_kernel_api/tilize.h"
#include "compute_kernel_api/untilize.h"
#include "compute_kernel_api/eltwise_binary.h"
namespace NAMESPACE {
void kernel_main() {
  ::tt::CB v1 = ::tt::CB::c_in0;
  ::tt::CB v2 = ::tt::CB::c_out0;
  int32_t v3 = 4;
  tilize_init(v1, v3, v2);
  tilize_block(v1, v3, v2);
  cb_pop_front(v1, v3);
  cb_push_back(v2, v3);
  tilize_block(v1, v3, v2);
  cb_pop_front(v1, v3);
  cb_push_back(v2, v3);
  return;
}

void MAIN { kernel_main(); }
}
nsmithtt commented 2 months ago

FYI @ddilbazTT

@rpavlovicTT, Defne is currently refactoring the kernel desc interface to flatbuffer, if it's not too much effort to git mv the translate to flatbuffer file, it'd probably make her rebase much smoother.