About clean lambda in loss calculation

3neutronstar commented 2 years ago

Hi, thank you for the interesting work and its codes.

I have a question about the loss calculation in the code.

In the imagenet directory, the loss for obtaining saliency map is calculated as follows:

    loss=clean_lam * criterion (image,target)

However, in the case of others ('tiny-imagenet' and 'cifar' code),

the loss for obtaining saliency map is calculated as follows:

    loss=2*clean_lam*criterion (image,target)/num_classes

Is there any special reason to calculate the loss for generating a saliency map in a different way?

Janghyun1230 commented 2 years ago

Hello! Please note that

When clean_lam is positive, we update networks using aggregated gradients from clean and mixup data.
We used BCE for CIFAR and Tiny-ImageNet (codes borrowed from manifold mixup), and CE for ImageNet (codes borrowed from CutMix) for mixup data loss.
We fixed the saliency calculation method that we used CE for all datasets. (The kind of saliency loss functions do not matter much for the performance)

So the problem is that gradient scales between gradients from CE (saliency calculation with clean data) and gradients from BCE (mixup data) is different. We tried to rebalance gradient scales so that we can use an identical hyperparameter (_cleanlam) regardless of the setting.

Hope this helps!

3neutronstar commented 2 years ago

Thank you for the fast answer!

snu-mllab / PuzzleMix

About clean lambda in loss calculation #8