Potential bug when calculating KL div

lyakaap / VAT-pytorch

Virtual Adversarial Training (VAT) implementation for PyTorch

296 stars 43 forks source link

Potential bug when calculating KL div #1

Closed 9310gaurav closed 6 years ago

9310gaurav commented 6 years ago

In lines 43 and 52, shouldn't there be F.softmax(pred.detach()) ? Since the KLDivLoss() expects probabilities and not scores in the second argument?

Also while doing the forward pass : pred_hat = model(X + r_adv), shouldn't there be a detach() on r_adv, as advised in the paper : "By the definition, r˜v-adv depends on θ. Our numerical experiments, however, indicates that ∇θrv-adv is quite volatile with respect to θ, and we could not make effective regularization when we used the numerical evaluation of ∇θr˜v-adv in ∇θLDS( g x(n), θ). We have therefore followed the work of Goodfellow et al. (2015) and ignored the derivative of r˜v-adv with respect to θ."

lyakaap commented 6 years ago

Thanks for the comment.

In lines 43 and 52, shouldn't there be F.softmax(pred.detach()) ? Since the KLDivLoss() expects probabilities and not scores in the second argument?

KLD of pytorch requires log probability for the first argument despite vanilla probability for the second. So there isn't any problem here.

Also while doing ~

Although I refactored my implementation, even in the previous version, detach() is necessary to regard pred (= model(X) ) as if constant.

9310gaurav commented 6 years ago

"KLD of pytorch requires log probability for the first argument despite vanilla probability for the second." I believe "pred" variable is the score rather than vanilla probability, since you use F.log_softmax() to compute log probabilities. Softmax is required to convert raw scores into probabilities. So I think line 43 should be - adv_distance = kl_div(F.log_softmax(pred_hat, dim=1), F.softmax(pred)))

lyakaap commented 6 years ago

Oh, I've made a mistake, you're right. Thanks! Fixed it.