Closed flozi00 closed 1 year ago
Although I think it is a good idea, I doubt they want to depend on our custom CUDA kernels which leaves out the important parts of this library, namely CausalLinearAttention, ImprovedClusteredAttention.
@angeloskath, I don't think that assumption holds true anymore, it looks like there are several HF models with custom cuda kernels implemented. See https://huggingface.co/docs/transformers/model_doc/yoso I'd be interested in helping out with this.
What do you think about an implementation in the huggingface transformers repo ?