Open turian opened 3 years ago
Hi there, Happy to consult on anything. The sparse attention kernels included above are very fast, but require building blocksparse -- not sure if this will work for you all. Rewon
On Sun, Dec 6, 2020 at 3:10 AM Joseph Turian notifications@github.com wrote:
π New model addition Model description
Sparse Transformers (https://openai.com/blog/sparse-transformer/) are one of the two most efficient transformers for long range problems, according to Google's Long Arena paper: https://arxiv.org/pdf/2011.04006.pdf (Big Bird) is the other one.
The original Sparse Transformers work shows great results on text, images, and audio. Further OpenAI work Jukebox (https://openai.com/blog/jukebox/) uses Sparse Transformers to generate incredibly long raw music audio with style transfer. Lastly https://proceedings.icml.cc/static/paper_files/icml/2020/6095-Paper.pdf uses Sparse Transformers to achieve state-of-the-art CIFAR performance. Open source status
- the model implementation is available:
latest version, for CIFAR: https://github.com/openai/distribution_augmentation original, but not maintained: https://github.com/openai/sparse_attention Alternate implementation from FAIR: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sparse_multihead_attention.py
- the model weights are available:
https://github.com/openai/distribution_augmentation (CIFAR work) has model weights available, as described in the README: https://openaipublic.blob.core.windows.net/distribution-augmentation-assets/models/c10-15m-baseline.npz
Jukebox is open-source and has model weights, but is a larger pipeline that includes VQ-VAEs so it may not be of interest for a transformers-only library.
- who are the authors: @rewonc https://github.com/rewonc @myleott https://github.com/myleott @cclauss https://github.com/cclauss
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/8945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYEDVCS3HWQ3EIPG4I77WTSTNRDNANCNFSM4UPHW52Q .
cc'ing @madlag for info
π New model addition
Model description
Sparse Transformers (https://openai.com/blog/sparse-transformer/) are one of the two most efficient transformers for long range problems, according to Google's Long Arena paper: https://arxiv.org/pdf/2011.04006.pdf (Big Bird) is the other one.
The original Sparse Transformers work shows great results on text, images, and audio. Further OpenAI work Jukebox (https://openai.com/blog/jukebox/) uses Sparse Transformers to generate incredibly long raw music audio with style transfer. Lastly https://proceedings.icml.cc/static/paper_files/icml/2020/6095-Paper.pdf uses Sparse Transformers to achieve state-of-the-art CIFAR performance.
Open source status
latest version, for CIFAR: https://github.com/openai/distribution_augmentation original, but not maintained: https://github.com/openai/sparse_attention Alternate implementation from FAIR: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sparse_multihead_attention.py
https://github.com/openai/distribution_augmentation (CIFAR work) has model weights available, as described in the README: https://openaipublic.blob.core.windows.net/distribution-augmentation-assets/models/c10-15m-baseline.npz
Jukebox is open-source and has model weights, but is a larger pipeline that includes VQ-VAEs so it may not be of interest for a transformers-only library.