triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.5k stars 1.67k forks source link

Make TMA tests compatible with older CUDA toolchains #5221

Closed embg closed 3 days ago

embg commented 4 days ago

TMA fences require CUDA toolchain 12.3 or greater, but current gating does not check the CUDA toolchain version. This causes test_experimental_tma.py to fail when run with older CUDA toolchains.

Before

With cuda-12.0:

55 failed, 9 passed in 18.11s

With cuda-12.4:

64 passed in 11.99s

After

With cuda-12.0:

9 passed, 55 skipped in 4.26s

With cuda-12.4:

64 passed in 11.96s
ThomasRaoux commented 4 days ago

Do you ever run those tests while override ptxas?

embg commented 3 days ago

Do you ever run those tests while override ptxas?

@ThomasRaoux we run these tests on our internal pin using an older CUDA toolchain. The issue was discovered during internal pin update to 3.2.x.

embg commented 3 days ago

@peterbell10 Fixed all nits!

ThomasRaoux commented 3 days ago

Do you ever run those tests while override ptxas?

@ThomasRaoux we run these tests on our internal pin using an older CUDA toolchain. The issue was discovered during internal pin update to 3.2.x.

I know this is very minor changes but overall I'm not a big fan of this direction as supporting all the ptxas versions is obviously impossible. Are you testing on older ptxas to catch potential problems related to ptxas or do you have patches downstream that you want to test. If you have a way I would disable those tests downstream instead.

embg commented 3 days ago

@ThomasRaoux Spinning up a thread on Slack to discuss.