Closed Qiulin-W closed 2 years ago
Thanks for your question!
First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.
Thanks for your question!
First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.
Thanks so much for your reply! And what is the magnitude of ITM loss after pretraining?
It is around 0.11-0.13
It is around 0.11-0.13
Thanks so much!
Hi, @LiJunnan1992 Thanks for your work and make it public!
I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.
Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?
Thanks in advance!
Thanks for your question!
First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.
Hi, @LiJunnan1992 Thanks for your work and make it public!
I'm trying to collect the negative samples from the whole batch, i.e., not limited to the same mini-batch.
Could you please tell me whether such a sampling strategy will incur disaster to ITM loss under the current label order?
Thanks in advance!
Thanks for your question! First, the negative samples are sampled from the mini-batch. Second, the order of the itm_labels doesn't affect the result, because the loss takes an average across the batch.
Yes you can do it. Please check out our BLIP's code on hard negative mining from all GPUs: https://github.com/salesforce/BLIP/blob/main/models/blip_retrieval.py
Hi, @LiJunnan1992
Thanks for your reply! We will follow the BLIP work again.
We want to follow the promising ALBEF and BLIP, however the dataset becomes a obstacle.
The SBU Captions is inaccessible, could you please offer one copy to us? Thanks! We will do our best to move forward and appreciate your enthusiastic help!
Hi,
Thanks for the great work. After reading the code for calculating ITM loss, I have a question below:
The itm labels for positive and negative samples are in a "fixed" order instead of being shuffled. I'm wondering whether the order be an issue for the ITM loss to work correctly? In some other VLP models such as ViLT, the ITM loss is calculated based on an shuffled pos-neg batches, which is detailed at https://github.com/dandelin/ViLT/blob/762fd3975c180db6fc88f577cf39549983fa373a/vilt/modules/objectives.py#L207
Thanks in advance for your kind reply.