tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
458 stars 67 forks source link

[MCW] Systematic Perf Optimisation/Utilisation analysis of every Conv/MM #11880

Open punithsekar opened 2 months ago

punithsekar commented 2 months ago

@punithsekar , Please update this ticket description (not in comment), with brief summary of what's done and add what is pending.

punithsekar commented 2 months ago

Please find the commit for a Linear configuration with different core_grids.

The perf sheets are:

segformer_configuration_grid8x2.csv segformer_configuration_grid8x4.csv segformer_configuration_grid8x8.csv

The graph for the utilization percentage for different core_grid is:

Image

punithsekar commented 2 months ago

The MM in whole pipeline is sharded. It's available in branch punith/segformer_on_gs. I compared between Gs and Wh(Kept torch conv for convs which all GS is making issue) for the version which is on the branch , version 2 and version 3 of your reference. I see the average utilization of all three versions are same for MM, Whereas the version available in the commit has better performance.

I was also looking on conv optimization and will try to improve the utilization of conv and MM further with different configs

Attaching the perf sheet of MM for model which is in the branch,

segformer_gs_versionmain_MM.csv segformer_wh_versionmain_MM.csv

punithsekar commented 2 months ago

Commit for Wh-n150 implementation.

Attaching the latest perf sheet

segformer_wh_bf8_for_MM.csv

Note: The whole model pcc dropped from 0.85 to 0.3 when we use 8x4 instead 8x8 for one of the MM in mixffn sub_module. Currently, The PCC of segformer_model is around 0.3.

punithsekar commented 2 months ago

Latest perf sheet of segformer_model: segformer_1.csv

saichandax commented 1 month ago

On Hold currently, as the optimisations are complete and shared the observations accordingly in the parent ticket, with latest perfs and branch.