Closed Natithan closed 2 years ago
Hi @Natithan, sorry for this confusion, it is indeed a bit misleading. However, in terms of experiments/results, it does not matter whether loss_model is multiplied by $\beta$ or not. The reason is that the parameters that are updated by loss_model
, i.e. the target classifier's weights and biases, are solely influenced by loss_model
. Since we use Adam as an optimizer, we (almost) do not change our parameter updates if we multiply the loss by a constant (the adaptive learning rate divides the gradients by the factor again up to the epsilon constant). Still, to be more consistent with the paper, I agree that it would be better to use (loss_model + loss_z) * self.hparams.beta_classifier
. I will try to update it after ICML
Another very small thing: this line has target_params_soft = torch.softmax(target_params, dim=-1)
, which errors and I guess should be target_params_soft = torch.softmax(self.target_params, dim=-1)
:)
Thanks, that was indeed missing!
Hi,
Thanks for the cool work. A doubt I had: in the CITRIS paper, you explain just after Eq. 43 in C.3.3 that the full loss looks like: .
However, in the code, it looks like beta_classifier is only applied to loss_z:
Should the code be
loss = loss + (loss_model + loss_z) * self.hparams.beta_classifier
instead ofloss = loss + loss_model + loss_z * self.hparams.beta_classifier
?