Build autotuned kernels in parallel

triton-lang / triton

Development repository for the Triton language and compiler

https://triton-lang.org/

MIT License

13.48k stars 1.66k forks source link

Build autotuned kernels in parallel #4806

Open saagarjha opened 1 month ago

saagarjha commented 1 month ago

Autotuning takes a while and for us most of that time is actually spent compiling the JIT kernel for each configuration rather than running the code. Since this process happens on the host CPU and should not affect timings it would be nice if it could be run in parallel and then once that is done all the configurations could be tested on the GPU linearly. Is this something that might be worth supporting?

simonidaa commented 4 days ago

I will look into it!