huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.71k stars 26.22k forks source link

Feature Request: El-Attention #12793

Open suriyakode opened 3 years ago

suriyakode commented 3 years ago

🚀 Feature request

I've looked into the paper titled "EL-Attention: Memory Efficient Lossless Attention for Generation". It proposes a method for calculating attention that forgoes creating multi-head attention from the hidden state. This saves computational time and frees memory.

Motivation

El-attention seems to have no downsides, and promises significant memory and performance gains during training and inference.

Your contribution

The main difficulty may be in that it requires being added directly in to each model's attention mechanism code, or requires a ton of new subclasses for each part of each model. Maybe an easier solution to this would be a pipeline to use custom attention mechanism code.

tzuhsial commented 11 months ago

Any updates on this one?