huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.39k stars 27.09k forks source link

Add Loss Functions for QFormer Training in BLIP-2 Model (ITC, ITM, and ITG) #34019

Open thisisiron opened 1 month ago

thisisiron commented 1 month ago

Feature request

I propose adding a loss calculation for QFormer training in the BLIP-2 model. Implementing this feature would allow fine-tuning the QFormer and language models for image-text retrieval and captioning tasks, which is crucial for practical applications.

Motivation

I want to train the BLIP-2 model using the transformers library. In particular, loss functions for Image-Text Contrastive (ITC), Image-Text Matching (ITM), and Image-grounded Text Generation(ITG) are not included, which requires users to manually implement the loss functions.

Your contribution

I would like to contribute to this open-source project by implementing the loss functions.

narnia24 commented 1 month ago

hey, @thisisiron i would like to work on this.