[Tracking Issue] Using MMA and ldmatrix instrinsics to save shared memory usage

uwsampl / SparseTIR

SparseTIR: Sparse Tensor Compiler for Deep Learning

https://sampl.cs.washington.edu/SparseTIR/

Apache License 2.0

131 stars 14 forks source link

[Tracking Issue] Using MMA and ldmatrix instrinsics to save shared memory usage #52

Open yzh119 opened 2 years ago

yzh119 commented 2 years ago

Currently TC-SpMM still uses wmma abstractions for tensorization, which is not flexible and uses too much shared memory resources. We can turn to use MMA intrinsics instead to directly load non-contiguous global memory into warp-level memory and bypass the abstraction of fragments.