I am trying to train Adaface on a 270GB dataset with about 2M classes and 41M images. While I had no issues using smaller datasets, this time the loss per step does not seem to be decreasing (from 44.5 -> 43.7 in 70k steps, about 1 day of training). I have set batch size = 128 and lr = 0.1.
Anyone else have experience training Adaface on big datasets that can help, or any tips to improve?
I am trying to train Adaface on a 270GB dataset with about 2M classes and 41M images. While I had no issues using smaller datasets, this time the loss per step does not seem to be decreasing (from 44.5 -> 43.7 in 70k steps, about 1 day of training). I have set batch size = 128 and lr = 0.1. Anyone else have experience training Adaface on big datasets that can help, or any tips to improve?