tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
458 stars 67 forks source link

LLK functions inline not always work #11265

Open yugaoTT opened 2 months ago

yugaoTT commented 2 months ago

Describe the bug When testing GroupNorm, some shapes cause stack overflow on trisc0. This is probably caused by inline not always work and. cause the function call depth to blow up.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'yugao/gn_stack_overflow' and run TT_METAL_WATCHER=5 pytest tests/ttnn/unit_tests/operations/test_group_norm.py
  2. to avoid the overflow, use the git diff here to apply to llk third party

    
    diff --git a/common/inc/ckernel.h b/common/inc/ckernel.h
    index 37adc62..a30ac47 100644
    --- a/common/inc/ckernel.h
    +++ b/common/inc/ckernel.h
    @@ -356,7 +356,7 @@ inline void cfg_rmw_gpr(uint32_t cfg_addr32, uint32_t cfg_shamt, uint32_t cfg_ma
    }
    
    template <uint CfgAddr32, uint Shamt, uint Mask>
    -inline void cfg_reg_rmw_tensix(uint32_t val)
    +inline __attribute__((always_inline)) void cfg_reg_rmw_tensix(uint32_t val)
    {
     uint32_t wrdata = val<<Shamt;
     uint8_t mask_b0 = Mask & 0xff;
    diff --git a/common/inc/cmath_common.h b/common/inc/cmath_common.h
    index 7470385..60a5ea3 100644
    --- a/common/inc/cmath_common.h
    +++ b/common/inc/cmath_common.h
    @@ -228,12 +228,12 @@ inline constexpr bool is_32bit_input(const std::uint32_t src_format, const std::
            ((output_df == (uint)DataFormat::Int32) || (output_df == (uint)DataFormat::Float32));
    }

-inline constexpr int get_math_num_fidelity_phases(const int math_fidelity_desc) +ALWI constexpr int get_math_num_fidelity_phases(const int math_fidelity_desc) { return (math_fidelity_desc & 0x7); }

-inline constexpr int get_math_fidelity_increment(const int math_fidelity_desc) +ALWI constexpr int get_math_fidelity_increment(const int math_fidelity_desc) { return ((math_fidelity_desc >> 3) & 0x1) + 1; } diff --git a/common/inc/cunpack_common.h b/common/inc/cunpack_common.h index 66a1f5d..b82cbcf 100644 --- a/common/inc/cunpack_common.h +++ b/common/inc/cunpack_common.h @@ -361,7 +361,7 @@ namespace ckernel::unpacker }

template <std::uint32_t UNP_SEL = p_setadc::UNP_AB>

Expected behavior changing to always inline makes the test pass.

yugaoTT commented 2 months ago

fyi @tt-dma