When I use the code, the loss goes to "NaN" even if I do not modify anything (w/o the path of the dataset - the directory for the resized images)
I think that the warm-up procedure might raise some problems (it increases the learning rate up to 0.5!)
However, I am not sure that if it is okay to train the model with a reduced learning rate or w/o the warm-up.
In my cases, is there a suggestion for training ResNet-50 with PuzzleMix?
Hi, thanks for the interesting work and the code!
I try training ResNet-50 with PuzzleMix.
When I use the code, the loss goes to "NaN" even if I do not modify anything (w/o the path of the dataset - the directory for the resized images)
I think that the warm-up procedure might raise some problems (it increases the learning rate up to 0.5!) However, I am not sure that if it is okay to train the model with a reduced learning rate or w/o the warm-up.
In my cases, is there a suggestion for training ResNet-50 with PuzzleMix?
Thanks!