I am wondering why there are two different implementations of $\ell_s$, the symmetric KL divergence in the code (SmartPerturbation):
first in the loop:
adv_loss = stable_kl(adv_logits, logits.detach(), reduce=False)
and then outside of the loop:
adv_loss = adv_lc(logits, adv_logits, ignore_index=-1)
where adv_lc is the SymKlCriterion that is implemented differently from stable_kl.
am I missing something?
Really appreciate your help!
It doesn't matter much which criterion is used at the internal loop. However, it does matter for the outside which is used to reg the model. Note that the internal loop is to estimate adv samples.
Hello,
I am wondering why there are two different implementations of $\ell_s$, the symmetric KL divergence in the code (SmartPerturbation):
first in the loop:
adv_loss = stable_kl(adv_logits, logits.detach(), reduce=False)
and then outside of the loop:adv_loss = adv_lc(logits, adv_logits, ignore_index=-1)
where adv_lc is the SymKlCriterion that is implemented differently from stable_kl.am I missing something? Really appreciate your help!