harvardnlp / genbmm

CUDA kernels for generalized matrix-multiplication in PyTorch
79 stars 13 forks source link