schmidek commented 3 years ago

🌟 New model addition

Model description

Perceiver is a general architecture that works on many kinds of data, including images, video, audio, 3D point clouds, language and symbolic inputs, multimodal combinations, etc. Perceivers can handle new types of data with only minimal modifications. Perceivers process inputs using domain-agnostic Transformer-style attention. Unlike Transformers, Perceivers first map inputs to a small latent space where processing is cheap and doesn't depend on the input size. This makes it possible to build very deep networks even when using large inputs like images or videos.

Perceiver IO is a generalization of Perceiver to handle arbitrary outputs in addition to arbitrary inputs. The original Perceiver only produced a single classification label. In addition to classification labels, Perceiver IO can produce (for example) language, optical flow, and multimodal videos with audio. This is done using the same building blocks as the original Perceiver. The computational complexity of Perceiver IO is linear in the input and output size and the bulk of the processing occurs in the latent space, allowing us to process inputs and outputs that are much larger than can be handled by standard Transformers. This means, for example, Perceiver IO can do BERT-style masked language modeling directly using bytes instead of tokenized inputs.

https://arxiv.org/pdf/2107.14795.pdf

Open source status

[x] the model implementation is available: https://github.com/deepmind/deepmind-research/tree/master/perceiver (JAX)
[x] the model weights are available: https://storage.googleapis.com/perceiver_io/language_perceiver_io_bytes.pickle pretrained masked language model (https://github.com/deepmind/deepmind-research/blob/master/perceiver/colabs/masked_language_modelling.ipynb)
[x] who are the authors: DeepMind Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira

cronoik commented 3 years ago

I want to do it unless someone else did it by September 8th.

esceptico commented 3 years ago

@cronoik
I've implemented perceiver io on pytorch: link Now we need to adapt it for Transformers :) But I have not (yet) added positional Fourier encoding and multimodal decoder

jmwoloso commented 3 years ago

Don't forget about the transformers-cli tool for adding new models.

Edit: link

cronoik commented 3 years ago

@esceptico I am not interested in doing the job twice or in a race. If you're already working on it, I'll find something else. :)

esceptico commented 3 years ago

@cronoik I'm not working on adaptation of my implementation for Transformers yet I mean that I will only be glad if you want to use my repository for this :)

tonibagur commented 3 years ago

HI all, I just wanted to know if this issue is in active development or is waiting for a developer to do it.

LysandreJik commented 3 years ago

Hi @tonibagur, I believe @NielsRogge is currently working on it

tonibagur commented 3 years ago

Hi @LysandreJik, thanks for your reply. @NielsRogge I am interested in giving a try to the PerceiverIO model, if you need a tester don't hessitate to ask :)

Regards,

huggingface / transformers

Perceiver IO #12996

🌟 New model addition

Model description

Open source status