huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.57k stars 27.14k forks source link

Add OFA to transformers #15813

Open JustinLin610 opened 2 years ago

JustinLin610 commented 2 years ago

🌟 New model addition

We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, referring expression comprehension, etc. We would like to implement OFA on transformers if it is possible.

Model description

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

Our code base is Fairseq. We wonder if there is any simpler solution for the transfer from Fairseq to transformers. To email us, please contact zheluo.wp@alibaba-inc.com or junyang.ljy@alibaba-inc.com.

Open source status

patil-suraj commented 2 years ago

cc @gchhablani you might find this interesting :-)

gchhablani commented 2 years ago

@patil-suraj Yes! Definitely would love to take this up :)

patil-suraj commented 2 years ago

Awesome! Happy to help you with this! Let me know if you have questions :)

A2va commented 1 year ago

Not sure if this is was a coordinated collaborative effort, but the OFA-Sys team seems to have made a preliminary release of a transformers implementation on a separate OFA branch.

Any news on this ? If the implementation exists a new PR could be opened without too much work.

Also, I have question on the OFA-large huggingface repo the model is 2.45 GB but on the checkpoint in the OFA repo it downloads a 5.8 GB file. Why such a difference between the two ?