Should beta_classifier be multiplied with both loss_model and loss_z?

Natithan commented 2 years ago

Hi,

Thanks for the cool work. A doubt I had: in the CITRIS paper, you explain just after Eq. 43 in C.3.3 that the full loss looks like: .

However, in the code, it looks like beta_classifier is only applied to loss_z:

Should the code be loss = loss + (loss_model + loss_z) * self.hparams.beta_classifier instead of loss = loss + loss_model + loss_z * self.hparams.beta_classifier?

phlippe commented 2 years ago

Hi @Natithan, sorry for this confusion, it is indeed a bit misleading. However, in terms of experiments/results, it does not matter whether loss_model is multiplied by $\beta$ or not. The reason is that the parameters that are updated by loss_model, i.e. the target classifier's weights and biases, are solely influenced by loss_model. Since we use Adam as an optimizer, we (almost) do not change our parameter updates if we multiply the loss by a constant (the adaptive learning rate divides the gradients by the factor again up to the epsilon constant). Still, to be more consistent with the paper, I agree that it would be better to use (loss_model + loss_z) * self.hparams.beta_classifier. I will try to update it after ICML

Natithan commented 2 years ago

Another very small thing: this line has target_params_soft = torch.softmax(target_params, dim=-1), which errors and I guess should be target_params_soft = torch.softmax(self.target_params, dim=-1) :)

phlippe commented 2 years ago

Thanks, that was indeed missing!

phlippe / CITRIS

Should beta_classifier be multiplied with both loss_model and loss_z? #1