Previously there is a bug with the mask load, so we have to achieve accuracy with extra overhead. This overhead will be removed in this PR.
Expected Behavior & Potential Risk
N/A
How has this PR been tested?
Internal IPEX CI
Performance on MTL
xetla barnch
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 9.90242
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.4184
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.3561
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.5452
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 51.7042
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 49.3453
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 49.573
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 52.3423
This PR
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 12.8365
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.3975
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 11.3263
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.0362
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 56.154
[ RUN ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 58.3497
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 55.4583
[ RUN ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 58.2443
Type of Change: Feature
API not changed
Description
Previously there is a bug with the mask load, so we have to achieve accuracy with extra overhead. This overhead will be removed in this PR.
Expected Behavior & Potential Risk
N/A
How has this PR been tested?
Internal IPEX CI
Performance on MTL
xetla
barnchThis PR
Dependency Change?
No