Open remi-or opened 3 weeks ago
Note that the tutorial is just meant to be a tutorial for getting started with Triton programming. It's not meant to be a reference performance kernel. If you'd like to what changes needed for the performance, you can refer to https://github.com/triton-lang/triton/pull/4863. Also you can checkout some downstream kernels we have https://github.com/ROCm/triton/tree/main_perf/python/perf-kernels.
Hi @antiagainst , thanks for the links. I have tried them out and it seems that they still don't provide rocBLAS-like performances on the MI300X. This might be a little out-of-scope for this issue, but do you or anyone else have knowledge of someone replicating rocBLAS performances on such GPUs? I am asking because it seems odd to me that I could get such performances on older GPUs but not this one. Thanks!
Triton doesn't provide rocBLAS-like performance for these gemm sizes. For some gemm sizes, triton can get on par with rocBLAS, but it needs more advanced compiler changes, which is not included in the main branch yet. For other gemm sizes, triton performance is usually limited by "tile sizes have to be power of 2".
I am asking because it seems odd to me that I could get such performances on older GPUs but not this one
Chances are rocBLAS does not provide tuned configs for MI200 GPUs. Can you post the numbers for both MI300 and MI200?
Hello, I am experiencing performance issues when running triton on an AMD GPU, the MI300X. When running the script
03-matrix-multiplication.py
, I get this ouput:The part about torch and triton not matching is not that worrying to me, I believe it has something to do with denormals, but the performance issue is a big problem. When running on an older AMD GPU, the MI210, these issues were not present. I have also tried building triton from source and with another version (3.0.0) but the performances were still not close to rocBLAS. I also tried adding
matrix_instr_nonkdim = 16
andkpack = 2
in the config kwargs, but it did not help (when using triton 3.1). Any idea on how to fix this please? Thanks!