tingxueronghua / pytorch-classification-advprop

MIT License
105 stars 16 forks source link

The mixbn loss #4

Closed ksouvik52 closed 3 years ago

ksouvik52 commented 3 years ago

Hi I did not understand the followig part of the code in imagenet.py if mixbn: with torch.no_grad(): batch_size = outputs.size(0) loss_main = criterion(outputs[:batch_size // 2], targets[:batch_size // 2]).mean() loss_aux = criterion(outputs[batch_size // 2:], targets[batch_size // 2:]).mean() prec1_main = accuracy(outputs.data[:batch_size // 2], targets.data[:batch_size // 2], topk=(1,))[0] prec1_aux = accuracy(outputs.data[batch_size // 2:], targets.data[batch_size // 2:], topk=(1,))[0] losses_main.update(loss_main.item(), batch_size // 2) losses_aux.update(loss_aux.item(), batch_size // 2) top1_main.update(prec1_main.item(), batch_size // 2) top1_aux.update(prec1_aux.item(), batch_size // 2)

If we are not at all using the loss_main and loss_aux, why are we generating them separately? Also, why there are two ifs in the train function: 1. if args.mixup and then this if.?

tingxueronghua commented 3 years ago

For the first question, loss_main and loss_aux are caclulated separately so that we can debug more effectively. The loss using aux batchnorm should be lower than that using main batchnorm, which is an important feature. I am sorry but I don't understand the second question. Could you tell me the line numbers of the two ifs?

ksouvik52 commented 3 years ago

Hi, thanks for your reply. I was asking that the loss_main and loss_aux are never added and used for the backdrop. So, I am not quite sure how you used the two losses while using mixbn? Can you explain that part?

tingxueronghua commented 3 years ago

In fact, we don't need to calculate the corresponding gradients using loss_main and loss_aux, respectively. In this code, different images in a batch will have different outputs and losses. So we only need to do backward operations on the mean or sum of final losses. In other words, calculating the gradients using loss, are the same as averaging the gradients of loss_main and loss_aux.

ksouvik52 commented 3 years ago

I see. That means the loss_main and loss_aux are used only for tracking the 2 losses and their magnitude separately. Thanks for your explanation. Also, did you run the code for ResNet18 or 34, meaning smaller resnets? Do they only perform better for larger resnet? Any thoughts? I just tried on resnet18 on cifar10, it seems not performing that well. I thought of merging my part with your repo if it performs well.

tingxueronghua commented 3 years ago

Your understanding about the loss_main and loss_aux is correct. As for the second problem, to be short, it is the problem of datasets, rather than the problem of model capacity. I have not run ResNet18 or 34, but I have run some kind of Wide ResNet on Cifar, which didn't perform well. Also I heard from the author of Adversarial Examples Improve Image Recognition that AdvProp didn't work on Cifar. I think it is mainly because there is not much room for improvement of generalization ability on Cifar.

ksouvik52 commented 3 years ago

That makes sense. Thanks, @tingxueronghua!

tingxueronghua commented 3 years ago

Thanks for your attention! I am glad that you try improving this code:) We can continue to communicate if there is any further progress.