Open punithsekar opened 2 months ago
Please find the commit for a Linear configuration with different core_grids.
The perf sheets are:
segformer_configuration_grid8x2.csv segformer_configuration_grid8x4.csv segformer_configuration_grid8x8.csv
The graph for the utilization percentage for different core_grid is:
The MM in whole pipeline is sharded. It's available in branch punith/segformer_on_gs. I compared between Gs and Wh(Kept torch conv for convs which all GS is making issue) for the version which is on the branch , version 2 and version 3 of your reference. I see the average utilization of all three versions are same for MM, Whereas the version available in the commit has better performance.
I was also looking on conv optimization and will try to improve the utilization of conv and MM further with different configs
Attaching the perf sheet of MM for model which is in the branch,
segformer_gs_versionmain_MM.csv segformer_wh_versionmain_MM.csv
Kept MM weights and bias in bf8 instead of bfloat16.
Tried using bf8 for conv weights and bias punith/segformer_bf8_conv _dtype but encountered with a issue.
Commit for Wh-n150 implementation.
Attaching the latest perf sheet
Note: The whole model pcc dropped from 0.85 to 0.3 when we use 8x4 instead 8x8 for one of the MM in mixffn sub_module. Currently, The PCC of segformer_model is around 0.3.
Latest perf sheet of segformer_model: segformer_1.csv
On Hold currently, as the optimisations are complete and shared the observations accordingly in the parent ticket, with latest perfs and branch.
@punithsekar , Please update this ticket description (not in comment), with brief summary of what's done and add what is pending.