Closed schmidek closed 2 years ago
I want to do it unless someone else did it by September 8th.
@cronoik
I've implemented perceiver io on pytorch: link
Now we need to adapt it for Transformers :)
But I have not (yet) added positional Fourier encoding and multimodal decoder
Don't forget about the transformers-cli
tool for adding new models.
Edit: link
@esceptico I am not interested in doing the job twice or in a race. If you're already working on it, I'll find something else. :)
@cronoik I'm not working on adaptation of my implementation for Transformers yet I mean that I will only be glad if you want to use my repository for this :)
HI all, I just wanted to know if this issue is in active development or is waiting for a developer to do it.
Hi @tonibagur, I believe @NielsRogge is currently working on it
Hi @LysandreJik, thanks for your reply. @NielsRogge I am interested in giving a try to the PerceiverIO model, if you need a tester don't hessitate to ask :)
Regards,
🌟 New model addition
Model description
Perceiver is a general architecture that works on many kinds of data, including images, video, audio, 3D point clouds, language and symbolic inputs, multimodal combinations, etc. Perceivers can handle new types of data with only minimal modifications. Perceivers process inputs using domain-agnostic Transformer-style attention. Unlike Transformers, Perceivers first map inputs to a small latent space where processing is cheap and doesn't depend on the input size. This makes it possible to build very deep networks even when using large inputs like images or videos.
Perceiver IO is a generalization of Perceiver to handle arbitrary outputs in addition to arbitrary inputs. The original Perceiver only produced a single classification label. In addition to classification labels, Perceiver IO can produce (for example) language, optical flow, and multimodal videos with audio. This is done using the same building blocks as the original Perceiver. The computational complexity of Perceiver IO is linear in the input and output size and the bulk of the processing occurs in the latent space, allowing us to process inputs and outputs that are much larger than can be handled by standard Transformers. This means, for example, Perceiver IO can do BERT-style masked language modeling directly using bytes instead of tokenized inputs.
https://arxiv.org/pdf/2107.14795.pdf
Open source status