Closed dinhanhx closed 1 year ago
@NielsRogge Gentle ping because I saw your name in the docs
cc @younesbelkada
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
as the issue is reopened, is there any plan to impl the loss for qformer?
Hi @jianantian , I didn't had time to have a look unfortunately, if you want to try your hands on it, feel free to open a PR and we'll guide you frm there!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Feature request
In BLIP-2, there is a pretraining stage (or stage 1) of QFormer.
Implementation of QFormer in this stage is requested.
Motivation
In HuggingFace's source code of BLIP-2, I see no implementations for text inputs, Image-text contrastive loss, Image-grounded text generation loss, Image-text matching loss for pretraining. Currently, The source code only provides for vision-language generative learning (stage 2). Therefore, it will be very helpful for people who are interested in stage 1 of QFormer (like me).
Your contribution
Unfortunately, I don't think there is a way that I could help.