Accelerating custom sparsity patterns on GPU with triton

triton-lang / triton

Development repository for the Triton language and compiler

https://triton-lang.org/

MIT License

13.43k stars 1.65k forks source link

Accelerating custom sparsity patterns on GPU with triton #4929

Open abhishektyaagi opened 1 month ago

abhishektyaagi commented 1 month ago

Hi, I understand that NVIDIA GPUs currently support the acceleration of 2:4 Sarsity (on Tensorcore) and Block Sparsity (on Tensorcore).

How can I accelerate my custom sparsity pattern on the GPU using triton? If we assume that a matrix M having non-zero elements in the structured format can be converted to say a dense matrix of a smaller size (with some post-processing), how can I use triton to accelerate its execution on the GPU?

Any input is appreciated.

hyx1999 commented 3 weeks ago

I have the same question. Meanwhile, if the sparse tensor core is not supported now, can we implement a load in sparse and compute in dense kernel based on triton?

gmlwns2000 commented 2 weeks ago

I am also curious about how we can utilize the sparse tensor core with triton.