mk-minchul / AdaFace

MIT License
662 stars 122 forks source link

Problems with big datasets? #80

Open trnikon opened 1 year ago

trnikon commented 1 year ago

I am trying to train Adaface on a 270GB dataset with about 2M classes and 41M images. While I had no issues using smaller datasets, this time the loss per step does not seem to be decreasing (from 44.5 -> 43.7 in 70k steps, about 1 day of training). I have set batch size = 128 and lr = 0.1. Anyone else have experience training Adaface on big datasets that can help, or any tips to improve?

Suvi-dha commented 1 year ago

Even I am facing this now, training on 65M images the loss has stopped converging after epoch 0.