salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

the variance of loss is very large when initial training. #36

Closed shoutOutYangJie closed 2 years ago

shoutOutYangJie commented 2 years ago

image

I want to reproduce your result using your code. I download datasets excluding CC12M. And I find the three losses vary dramaticly. I don't know whether normal or not?

LiJunnan1992 commented 2 years ago

The variation you observe between batches should be normal.