Open suzhenghang opened 3 years ago
The loss would be around -9.3~-9.4. There is a loss curve in the original paper (Fig.2), you can check that.
Thanks. I did both of unsupervised pretraining (BatchSize512+SGD+100Epoch+CosineLR0.1+NegativeCosineProximity loss)and linear evaluation(BatchSize4096+LARS+100Epoch+CosineLR1.6+CrossEntropy loss), and finally got 68.0%. I found that Sycn-BatchNorm is critical. Without Sycn-BatchNorm, i could get 65.1% only. The training curve is as follow.
Yes, SyncBN seems to be critical. Thanks for the sharing.
Hi @taoyang1122 , thanks for opening such good codes. What is the loss value at the end of training and could you share the loss curve?