snu-mllab / PuzzleMix

Official PyTorch implementation of "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" (ICML'20)
MIT License
157 stars 17 forks source link

About clean lambda in loss calculation #8

Closed 3neutronstar closed 2 years ago

3neutronstar commented 2 years ago

Hi, thank you for the interesting work and its codes.

I have a question about the loss calculation in the code.

In the imagenet directory, the loss for obtaining saliency map is calculated as follows:

    loss=clean_lam * criterion (image,target)

However, in the case of others ('tiny-imagenet' and 'cifar' code),

the loss for obtaining saliency map is calculated as follows:

    loss=2*clean_lam*criterion (image,target)/num_classes

Is there any special reason to calculate the loss for generating a saliency map in a different way?

Janghyun1230 commented 2 years ago

Hello! Please note that

So the problem is that gradient scales between gradients from CE (saliency calculation with clean data) and gradients from BCE (mixup data) is different. We tried to rebalance gradient scales so that we can use an identical hyperparameter (_cleanlam) regardless of the setting.

Hope this helps!

3neutronstar commented 2 years ago

Thank you for the fast answer!