huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.06k stars 26.3k forks source link

Long-Short Transformer #12635

Open schmidek opened 3 years ago

schmidek commented 3 years ago

🌟 New model addition

Model description

https://arxiv.org/abs/2107.02192

In this paper, they propose Long-Short Transformer, an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity.

Open source status

schmidek commented 3 years ago

A PyTorch implementation: https://github.com/lucidrains/long-short-transformer

NielsRogge commented 3 years ago

Cool work! However, models have a low chance of being added if there are no pre-trained weights available.

zhuchen03 commented 3 years ago

Thanks for your interest in our work! We have released the code for ImageNet and LRA at https://github.com/NVIDIA/transformer-ls. Pretrained weights for ImageNet are also available. We will release the character-level LM soon.

Bearnardd commented 1 year ago

Hi @zhuchen03! - Since I would like to add your model to the HuggingFace I am wondering if the pretrained weights are also available for character-level LM?

Bearnardd commented 1 year ago

Hi @NielsRogge @zhuchen03 - I would like to implement these models. I will start with the ImageNet classification one.