snu-mllab / PuzzleMix

Official PyTorch implementation of "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" (ICML'20)
MIT License
157 stars 17 forks source link

ImageNet training #5

Closed FriedRonaldo closed 3 years ago

FriedRonaldo commented 3 years ago

Hi, thanks for the interesting work and the code!

I try training ResNet-50 with PuzzleMix.

When I use the code, the loss goes to "NaN" even if I do not modify anything (w/o the path of the dataset - the directory for the resized images)

I think that the warm-up procedure might raise some problems (it increases the learning rate up to 0.5!) However, I am not sure that if it is okay to train the model with a reduced learning rate or w/o the warm-up.

In my cases, is there a suggestion for training ResNet-50 with PuzzleMix?

Thanks!

FriedRonaldo commented 3 years ago

After removing "amp", it is solved! I think that the loss scaling raises some issues..