Open abhishektyaagi opened 1 month ago
I have the same question. Meanwhile, if the sparse tensor core is not supported now, can we implement a load in sparse and compute in dense kernel based on triton?
I am also curious about how we can utilize the sparse tensor core with triton.
Hi, I understand that NVIDIA GPUs currently support the acceleration of 2:4 Sarsity (on Tensorcore) and Block Sparsity (on Tensorcore).
How can I accelerate my custom sparsity pattern on the GPU using triton? If we assume that a matrix M having non-zero elements in the structured format can be converted to say a dense matrix of a smaller size (with some post-processing), how can I use triton to accelerate its execution on the GPU?
Any input is appreciated.