salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.46k stars 193 forks source link

Retrieval result varies on multi-gpu distributed training #62

Open averyma opened 2 years ago

averyma commented 2 years ago

Hello,

I noticed that when training with ddp using multi-gpu setup, the recall numbers got higher than comparing to using single gpu. Have you encountered this issue? I wonder if this is a similar behaviour described in SimCLR paper(Sec 2.2).

LiJunnan1992 commented 2 years ago

Hi, batch size could affect the results.