Open schmidek opened 3 years ago
A PyTorch implementation: https://github.com/lucidrains/long-short-transformer
Cool work! However, models have a low chance of being added if there are no pre-trained weights available.
Thanks for your interest in our work! We have released the code for ImageNet and LRA at https://github.com/NVIDIA/transformer-ls. Pretrained weights for ImageNet are also available. We will release the character-level LM soon.
Hi @zhuchen03! - Since I would like to add your model to the HuggingFace I am wondering if the pretrained weights are also available for character-level LM?
Hi @NielsRogge @zhuchen03 - I would like to implement these models. I will start with the ImageNet classification one.
🌟 New model addition
Model description
https://arxiv.org/abs/2107.02192
In this paper, they propose Long-Short Transformer, an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity.
Open source status