JustinLin610 commented 2 years ago

🌟 New model addition

We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, referring expression comprehension, etc. We would like to implement OFA on transformers if it is possible.

Model description

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

Our code base is Fairseq. We wonder if there is any simpler solution for the transfer from Fairseq to transformers. To email us, please contact zheluo.wp@alibaba-inc.com or junyang.ljy@alibaba-inc.com.

Open source status

[x] the model implementation is available: (OFA official repo)
[x] the model weights are available: (OFA checkpoints)
[x] who are the authors: (Peng Wang @logicwong, An Yang @yangapku, Rui Men @jxst539246, Junyang Lin @JustinLin610, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yangf)

patil-suraj commented 2 years ago

cc @gchhablani you might find this interesting :-)

gchhablani commented 2 years ago

@patil-suraj Yes! Definitely would love to take this up :)

patil-suraj commented 2 years ago

Awesome! Happy to help you with this! Let me know if you have questions :)

A2va commented 1 year ago

Not sure if this is was a coordinated collaborative effort, but the OFA-Sys team seems to have made a preliminary release of a transformers implementation on a separate OFA branch.

Any news on this ? If the implementation exists a new PR could be opened without too much work.

Also, I have question on the OFA-large huggingface repo the model is 2.45 GB but on the checkpoint in the OFA repo it downloads a 5.8 GB file. Why such a difference between the two ?

huggingface / transformers

Add OFA to transformers #15813

🌟 New model addition

Model description

Open source status