schmidek commented 3 years ago

🌟 New model addition

Model description

In this paper, they propose Long-Short Transformer, an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity.

Open source status

[x] the model implementation is available: https://github.com/NVIDIA/transformer-ls
[x] the model weights are available: https://github.com/NVIDIA/transformer-ls
[x] who are the authors: Chen Zhu (@zhuchen03) and Wei Ping and Chaowei Xiao and Mohammad Shoeybi and Tom Goldstein and Anima Anandkumar and Bryan Catanzaro (NVIDIA, University of Maryland)

schmidek commented 3 years ago

A PyTorch implementation: https://github.com/lucidrains/long-short-transformer

NielsRogge commented 3 years ago

Cool work! However, models have a low chance of being added if there are no pre-trained weights available.

zhuchen03 commented 3 years ago

Thanks for your interest in our work! We have released the code for ImageNet and LRA at https://github.com/NVIDIA/transformer-ls. Pretrained weights for ImageNet are also available. We will release the character-level LM soon.

Bearnardd commented 1 year ago

Hi @zhuchen03! - Since I would like to add your model to the HuggingFace I am wondering if the pretrained weights are also available for character-level LM?

Bearnardd commented 1 year ago

Hi @NielsRogge @zhuchen03 - I would like to implement these models. I will start with the ImageNet classification one.

huggingface / transformers

Long-Short Transformer #12635

🌟 New model addition

Model description

Open source status