Is there a reason we have variations? is it required to change approximate flag during runtime?
I think we should have all the sfpu functions use the APPROX compile time flag if possible. And if users need to have multiple ops, with a mix of approx modes, then approx needs to be templated for those ops instead. Moreover, the above is prone to bugs, because the functions that have approx as a runtime variable, also have APPROX= false generated in the compute kernel defines, but the flag is set to true for the reader/writer kernels. This might cause problems while debugging.
Add template fix, and removed hard-coded init approx here: #8004
[x] Reduce op templates
The reduce ops in the api kernel:tt_metal/include/compute_kernel_api/reduce.h have reduce_op and dim sent as function parameters, but they are completely ignored, and the REDUCE_OP and REDUCE_DIM defines are used instead:
still have this problem, and are called in a lot of test files. The defines have been templated instead (for users that need to call multiple reduce ops in the same kernel), and the remaining item is PoolType reduce_op, ReduceDim dim, parameters need to be removed from all the test files.
[ ] Remove unused includes in ckernel sfpu
This include:
#include "noc_nonblocking_api.h"
was included in ckernel_sfpu.h when it included all sfpu functions, because the dropout function needed it. But now that sfpu/ckernel_sfpu_*.h functions are all separated, that include should be removed from all functions that don't need it. For example in sfpu/ckernel_sfpu_abs.h
#pragma once
#include "ckernel.h"
#include "ckernel_defs.h"
#include "noc_nonblocking_api.h"
using namespace sfpi;
namespace ckernel {
namespace sfpu {
template <bool APPROXIMATION_MODE, int ITERATIONS = 8>
inline void calculate_abs()
{
// SFPU microcode
for (int d = 0; d < ITERATIONS; d++)
{
vFloat v = dst_reg[0];
dst_reg[0] = sfpi::abs(v);
dst_reg++;
}
}
The include is unused.
In addition, it should also be removed from ckernel_sfpu_dropout.h. And this functionality that dropout needs:
Should either be pulled into the tt-llk-* submodules, or another way should be found. Ckernel files should not be including files that are in the higher level apis. This can also cleanup up the includes in the jit_build
Clean up items for compute kernels
The approximate flag for sfpu kernels has variations where some kernels call the flag as a runtime parameter:
or
while other sfpu kernels have it as compile time APPROX flag, generated in the defines:
Is there a reason we have variations? is it required to change approximate flag during runtime?
I think we should have all the sfpu functions use the APPROX compile time flag if possible. And if users need to have multiple ops, with a mix of approx modes, then approx needs to be templated for those ops instead. Moreover, the above is prone to bugs, because the functions that have approx as a runtime variable, also have APPROX= false generated in the compute kernel defines, but the flag is set to true for the reader/writer kernels. This might cause problems while debugging.
Add template fix, and removed hard-coded init approx here: #8004
The reduce ops in the api kernel:
tt_metal/include/compute_kernel_api/reduce.h
have reduce_op and dim sent as function parameters, but they are completely ignored, and theREDUCE_OP
andREDUCE_DIM
defines are used instead:The first part of the cleanup has been merged here: https://github.com/tenstorrent/tt-metal/pull/7585
But the init functions:
still have this problem, and are called in a lot of test files. The defines have been templated instead (for users that need to call multiple reduce ops in the same kernel), and the remaining item is
PoolType reduce_op, ReduceDim dim,
parameters need to be removed from all the test files.This include:
was included in
ckernel_sfpu.h
when it included all sfpu functions, because the dropout function needed it. But now thatsfpu/ckernel_sfpu_*.h
functions are all separated, that include should be removed from all functions that don't need it. For example insfpu/ckernel_sfpu_abs.h
The include is unused.
In addition, it should also be removed from
ckernel_sfpu_dropout.h
. And this functionality that dropout needs:Should either be pulled into the
tt-llk-*
submodules, or another way should be found. Ckernel files should not be including files that are in the higher level apis. This can also cleanup up the includes in thejit_build