the variance of loss is very large when initial training.

salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method

BSD 3-Clause "New" or "Revised" License

1.53k stars 195 forks source link

the variance of loss is very large when initial training. #36

Closed shoutOutYangJie closed 2 years ago

shoutOutYangJie commented 2 years ago

I want to reproduce your result using your code. I download datasets excluding CC12M. And I find the three losses vary dramaticly. I don't know whether normal or not?

LiJunnan1992 commented 2 years ago

The variation you observe between batches should be normal.