Open JustinLin610 opened 2 years ago
cc @gchhablani you might find this interesting :-)
@patil-suraj Yes! Definitely would love to take this up :)
Awesome! Happy to help you with this! Let me know if you have questions :)
Not sure if this is was a coordinated collaborative effort, but the OFA-Sys team seems to have made a preliminary release of a transformers implementation on a separate OFA branch.
Any news on this ? If the implementation exists a new PR could be opened without too much work.
Also, I have question on the OFA-large huggingface repo the model is 2.45 GB but on the checkpoint in the OFA repo it downloads a 5.8 GB file. Why such a difference between the two ?
🌟 New model addition
We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, referring expression comprehension, etc. We would like to implement OFA on transformers if it is possible.
Model description
OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.
Our code base is Fairseq. We wonder if there is any simpler solution for the transfer from Fairseq to transformers. To email us, please contact zheluo.wp@alibaba-inc.com or junyang.ljy@alibaba-inc.com.
Open source status