tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
395 stars 48 forks source link

Sfpu kernels clean up #5424

Open rtawfik01 opened 6 months ago

rtawfik01 commented 6 months ago

Currently ckernel sfpu functions are defined in 3 areas:

  1. tt_metal/hw/ckernels/<device>/metal/llk_api/llk_sfpu/metal_ckernel_sfpu.h
  2. tt_metal/third_party/tt_llk_<device>/common/inc/ckernel_sfpu.h
  3. tt_metal/hw/ckernels/<device>/metal/llk_api/llk_sfpu/ckernel_sfpu_*.h

During the LLK uplift, any sfpu function that was created for metal repo, and not in the original ckernel_sfpu.h, was moved to metal_ckernel_sfpu.h. Unfortunately, some duplicate functions that caused failures when updated, were also kept in metal_ckernel_sfpu.h.

In addition, lots of sfpu functions were isolated to their own ckernel files here tt_metal/hw/ckernels/<device>/metal/llk_api/llk_sfpu/ckernel_sfpu_*.h so there is a chance of duplication and functions not being updated to what is in ckernel_sfpu.h

In order to clean up the sfpu kernels, we have to:

  1. Find all sfpu functions in the llk_api layer, that also exist in tt_metal/third_party/tt_llk_<device>/common/inc/ckernel_sfpu.h, find why they would fail if updated to use the functions in ckernel_sfpu.h. If it fails because an extra feature or instruction was added, see if those changes can be propagated back to common/inc/ckernel_sfpu.h

  2. For any new sfpu functions in metal_ckernel_sfpu.h, that don't exist in ckernel_sfpu.h, check if they should be also added to llk_lib, or if they should be isolated into their own header files: tt_metal/hw/ckernels/<device>/metal/llk_api/llk_sfpu/ckernel_sfpu_*.h. Goal should be to delete metal_ckernel_sfpu.h since it was a temporary solution.

@acejkov @ttmtrajkovic

Update by July 17, 2024:

metal_ckernel_sfpu.h has been deleted and made into individual ckernel functions, but there are still sfpu remaining with duplications.

Functions that are completed:

Remaining functions:

rtawfik01 commented 6 months ago

I have reviewed the grayskull sfpu ops, and here are the ops that have duplicate names (might not necessarily have exact duplicate functionality):

image

@acejkov @ttmtrajkovic fyi

rtawfik01 commented 6 months ago

I propose that in tt-llk-<device> repos, we break up the ckernel_sfpu.h file into separate files per function similar to tt-metal repo (for example ckernel_exp.h, ckernel_recip.h, etc), and then we still keep ckernel_sfpu.h file, which will have all the includes for all these separate files.

This method will keep back-compatibility with buda, but also will allow the tt-metal duplicate functions to point to the separate function headers, so it will not affect compilation performance for tt-metal. It will also allow flexibility for buda to include separate functions as well in the future.

So for the table I posted above for example. The function calculate_reciprocal, which exists in tt_metal/hw/ckernels/grayskull/metal/llk_api/llk_sfpu/ckernel_sfpu_recip.h, should just call the function _calculate_reciprocal_ in a separated out file in tt-llk-gs, such as tt_metal/third_party/tt_llk_grayskull/common/inc/sfpu/ckernel_sfpu_recip.h. Since both the calculate_reciprocal functions are duplicated in the metal repo.

@acejkov @ttmtrajkovic let me know what you think, or if you have alternate solutions to remove the above duplication, without impacting compilation perf

rdjogoTT commented 5 months ago

Progress:

To be done:

rtawfik01 commented 4 months ago

Sfpu exponential is cleaned for GS here: https://github.com/tenstorrent/tt-metal/pull/8014 and whb0 here: https://github.com/tenstorrent/tt-metal/pull/7693

amahmudTT commented 2 months ago

Delegated the reciprocal calls to the submodules for GS (https://github.com/tenstorrent/tt-metal/pull/10105) WHB0 (https://github.com/tenstorrent/tt-metal/pull/10103)

amahmudTT commented 2 months ago

Delegated the sqrt calls to the submodules for GS (https://github.com/tenstorrent/tt-metal/pull/10183) WHB0 (https://github.com/tenstorrent/tt-metal/pull/10185)

ncvetkovicTT commented 4 weeks ago

Calls for GELU and GELU' are cleaned up within the following PRs: GS - #11193 WH - #11150 BH - #11154