Open yonatanbitton opened 1 year ago
@LiJunnan1992 pinging to see if you have an idea about this issue 🙏 🙌
You can create a blip2_retrieval model by modifying blip2_qformer to take into account samples["image_id"]
when computing ITC and ITM, as done in blip_retrieval.
Then, you can create a yaml file for training on coco retrieval by following the template of this file.
For adding new dataset, you may refer to the LAVIS documentation.
You can create a blip2_retrieval model by modifying blip2_qformer to take into account
samples["image_id"]
when computing ITC and ITM, as done in blip_retrieval.Then, you can create a yaml file for training on coco retrieval by following the template of this file.
For adding new dataset, you may refer to the LAVIS documentation.
Could you please release the code so that we can reproduce the result? I cannot make it work based on this information. Thanks much!
@LiJunnan1992 sorry for the late response, but I also can't reproduce your results based on this information. Is there any chance to provide your implementation first to reproduce the results on ITM? Later we can try to understand how to fit this into new data. Supplying that will allow several valuable extensions of the BLIP2 model 🙏 (also to follow up on this Tweet). Thank you 🙌
@yonatanbitton @shengyi4 You can now finetune for retrieval by running this script: https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip2/train/train_retrieval_coco.sh
@yonatanbitton @shengyi4 You can now finetune for retrieval by running this script: https://github.com/salesforce/LAVIS/blob/main/run_scripts/blip2/train/train_retrieval_coco.sh
Thank you very much, I am checking that
Hey BLIP-2 team,
Thanks for your great work! I've been trying to reproduce the BLIP2 COCO ITM fine-tuning using the resources in your repo:
I couldn't find specific instructions or a command to reproduce the COCO ITM fine-tuning. As I understand
train_caption_coco.sh
relates to captioning andblip_itm_large.yaml
is BLIP1 not BLIP2. I also searched in the code and previous GitHub issues. Could you share the exact command or script to run this?Also, I plan to add new fine-tuning data later. Any tips on incorporating new data would be awesome.
Thanks for your help and your amazing work on BLIP-2!