tenstorrent / tt-mlir

Tenstorrent MLIR compiler
https://tenstorrent.github.io/tt-mlir/
Apache License 2.0
77 stars 13 forks source link

implement eltwise_max direct to metal (#533) #1335

Open vroubtsovTT opened 6 days ago

vroubtsovTT commented 6 days ago
  1. TTIR_MaximumOp has been made to extend TTIR_GenericElementwiseBinaryOp in TTIROps.td and the ensuing necessary modifications were done throughout the metal pipeline
  2. convertComputeBinaryOp() in TTIRToTTMetal.cpp has been made to build all "generic" binary ops like add/maximum via commonComputeBinaryOp<> parameterized with the type of a tile op; several local vars (inCB0, inCB1, etc) have become redundant in the new code structure and were moved inside the mlir::isa else-branch
  3. simple_max.mlir is a new test for this, intentionally distinct from simple_eltwise.mlir
nsmithtt commented 6 days ago

CI appears to be failing w/

/tmp/ttmlir_maximum_%5_tensix__0_0-0_0.cpp: In function 'void chlkc_pack::kernel_main()':
/tmp/ttmlir_maximum_%5_tensix__0_0-0_0.cpp:24:3: error: 'max_tiles_init' was not declared in this scope
   24 |   max_tiles_init(v2, v4);
      |   ^~~~~~~~~~~~~~
/tmp/ttmlir_maximum_%5_tensix__0_0-0_0.cpp:42:7: error: 'max_tiles' was not declared in this scope
   42 |       max_tiles(v2, v4, v19, v19, v20);
      |       ^~~~~~~~~
                 Always | FATAL    | trisc2 build failed

I think we are missing a header after all :p "compute_kernel_api.h" see: https://github.com/tenstorrent/tt-mlir/blob/843411790ba20fc5e534d563e66bdb4e4445d70e/lib/Conversion/TTKernelToEmitC/TTKernelToEmitC.cpp#L465

nsmithtt commented 6 days ago

Aside from the missing header the change looks perfect :)

vroubtsovTT commented 6 days ago

patched TTKernelToEmitC.cpp to add compute_kernel_api.h inclusion. For the test file, emitted C looks like

#include "llk_defs.h"
#include "compute_kernel_api/common.h"
#include "compute_kernel_api/tilize.h"
#include "compute_kernel_api/untilize.h"
#include "compute_kernel_api/eltwise_binary.h"
#include "compute_kernel_api.h"
#include "compute_kernel_api/tile_move_copy.h"
#include "compute_kernel_api/eltwise_unary/eltwise_unary.h"
#include "compute_kernel_api/eltwise_unary/exp.h"
#include "compute_kernel_api/eltwise_unary/sfpu_split_includes.h"
#include "compute_kernel_api/eltwise_unary/recip.h"
#define REDUCE_OP PoolType::SUM
#define REDUCE_DIM ReduceDim::REDUCE_COL
#include "compute_kernel_api/reduce.h"
namespace NAMESPACE {
void kernel_main() {
...
      max_tiles(v2, v4, v19, v19, v20);