huggingface / transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.37k stars 27.09k forks source link

Sparse Transormer #8945

Open turian opened 3 years ago

turian commented 3 years ago

🌟 New model addition

Model description

Sparse Transformers (https://openai.com/blog/sparse-transformer/) are one of the two most efficient transformers for long range problems, according to Google's Long Arena paper: https://arxiv.org/pdf/2011.04006.pdf (Big Bird) is the other one.

The original Sparse Transformers work shows great results on text, images, and audio. Further OpenAI work Jukebox (https://openai.com/blog/jukebox/) uses Sparse Transformers to generate incredibly long raw music audio with style transfer. Lastly https://proceedings.icml.cc/static/paper_files/icml/2020/6095-Paper.pdf uses Sparse Transformers to achieve state-of-the-art CIFAR performance.

Open source status

latest version, for CIFAR: https://github.com/openai/distribution_augmentation original, but not maintained: https://github.com/openai/sparse_attention Alternate implementation from FAIR: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sparse_multihead_attention.py

https://github.com/openai/distribution_augmentation (CIFAR work) has model weights available, as described in the README: https://openaipublic.blob.core.windows.net/distribution-augmentation-assets/models/c10-15m-baseline.npz

Jukebox is open-source and has model weights, but is a larger pipeline that includes VQ-VAEs so it may not be of interest for a transformers-only library.

rewonc commented 3 years ago

Hi there, Happy to consult on anything. The sparse attention kernels included above are very fast, but require building blocksparse -- not sure if this will work for you all. Rewon

On Sun, Dec 6, 2020 at 3:10 AM Joseph Turian notifications@github.com wrote:

🌟 New model addition Model description

Sparse Transformers (https://openai.com/blog/sparse-transformer/) are one of the two most efficient transformers for long range problems, according to Google's Long Arena paper: https://arxiv.org/pdf/2011.04006.pdf (Big Bird) is the other one.

The original Sparse Transformers work shows great results on text, images, and audio. Further OpenAI work Jukebox (https://openai.com/blog/jukebox/) uses Sparse Transformers to generate incredibly long raw music audio with style transfer. Lastly https://proceedings.icml.cc/static/paper_files/icml/2020/6095-Paper.pdf uses Sparse Transformers to achieve state-of-the-art CIFAR performance. Open source status

  • the model implementation is available:

latest version, for CIFAR: https://github.com/openai/distribution_augmentation original, but not maintained: https://github.com/openai/sparse_attention Alternate implementation from FAIR: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sparse_multihead_attention.py

  • the model weights are available:

https://github.com/openai/distribution_augmentation (CIFAR work) has model weights available, as described in the README: https://openaipublic.blob.core.windows.net/distribution-augmentation-assets/models/c10-15m-baseline.npz

Jukebox is open-source and has model weights, but is a larger pipeline that includes VQ-VAEs so it may not be of interest for a transformers-only library.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/8945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYEDVCS3HWQ3EIPG4I77WTSTNRDNANCNFSM4UPHW52Q .

julien-c commented 3 years ago

cc'ing @madlag for info