I propose adding a loss calculation for QFormer training in the BLIP-2 model. Implementing this feature would allow fine-tuning the QFormer and language models for image-text retrieval and captioning tasks, which is crucial for practical applications.
Motivation
I want to train the BLIP-2 model using the transformers library. In particular, loss functions for Image-Text Contrastive (ITC), Image-Text Matching (ITM), and Image-grounded Text Generation(ITG) are not included, which requires users to manually implement the loss functions.
Your contribution
I would like to contribute to this open-source project by implementing the loss functions.
Feature request
I propose adding a loss calculation for QFormer training in the BLIP-2 model. Implementing this feature would allow fine-tuning the QFormer and language models for image-text retrieval and captioning tasks, which is crucial for practical applications.
Motivation
I want to train the BLIP-2 model using the transformers library. In particular, loss functions for Image-Text Contrastive (ITC), Image-Text Matching (ITM), and Image-grounded Text Generation(ITG) are not included, which requires users to manually implement the loss functions.
Your contribution
I would like to contribute to this open-source project by implementing the loss functions.