jxzhanggg / nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC
MIT License
248 stars 56 forks source link

Why consistent_loss_w=0.0 ? #22

Closed Approximetal closed 4 years ago

Approximetal commented 4 years ago

self.consi_w = hparams.consistent_loss_w if self.consi_w == 0.: consist_loss = torch.tensor(0.).cuda() else: consist_loss = self.MSELoss(text_hidden, mel_hidden) mask = text_mask.unsqueeze(2).expand(-1, -1, text_hidden.size(2)) consist_loss = torch.sum(consist_loss * mask)/torch.sum(mask)

I see here set the weight to zero. Why not calculate this loss? Is that means the linguistic representations extracted from audio signals and from phoneme sequences are not similar?

jxzhanggg commented 4 years ago

self.consi_w = hparams.consistent_loss_w if self.consi_w == 0.: consist_loss = torch.tensor(0.).cuda() else: consist_loss = self.MSELoss(text_hidden, mel_hidden) mask = text_mask.unsqueeze(2).expand(-1, -1, text_hidden.size(2)) consist_loss = torch.sum(consist_loss * mask)/torch.sum(mask)

I see here set the weight to zero. Why not calculate this loss? Is that means the linguistic representations extracted from audio signals and from phoneme sequences are not similar?

Consist loss is not used in our model. It's a piece of legacy code. Therefore, its weight is set as 0. Contrastive loss is already used in our code to replace it. Thus the linguistic representations from audio and text are still close because of contrastive loss

Approximetal commented 4 years ago

Thank you for answering. Also I see grad_norm_sc is not used. What is the difference between xx_main and xx_sc? (like optimizer_main/optimizer_sc, parameters_main/parameters_sc, l_main/l_sc, etc)

jxzhanggg commented 4 years ago
  1. For grad_norm_sc, this is a byproduct from using gradient clip API of PyTorch
    torch.nn.utils.clip_grad_norm_()
  2. The difference of xx_main, xx_sc is they deals with different parts of the model parameters, The former corresponds to the model except speaker classifier, and the later corresponds to the speaker classifier module. See grouped_parameters() in model.py for more details.
Approximetal commented 4 years ago
  1. For grad_norm_sc, this is a byproduct from using gradient clip API of PyTorch
torch.nn.utils.clip_grad_norm_()
  1. The difference of xx_main, xx_sc is they deals with different parts of the model parameters, The former corresponds to the model except speaker classifier, and the later corresponds to the speaker classifier module. See grouped_parameters() in model.py for more details.

Is that means generator and classifier weights should be updated respectively? What will result if update them together?

jxzhanggg commented 4 years ago
  1. For grad_norm_sc, this is a byproduct from using gradient clip API of PyTorch
torch.nn.utils.clip_grad_norm_()
  1. The difference of xx_main, xx_sc is they deals with different parts of the model parameters, The former corresponds to the model except speaker classifier, and the later corresponds to the speaker classifier module. See grouped_parameters() in model.py for more details.

Is that means generator and classifier weights should be updated respectively? What will result if update them together?

Yes, they should be updated separately. This is because the classifier must be fixed while passing the speaker adversarial gradients to the "generator". And the "generator" must be fixed while passing the gradients of speaker classification loss to the classifier. Updating them separately can achieve this adversarial training process conveniently.

jxzhanggg commented 4 years ago

updated the code in cd000fe0021a82b841eccca576484035ba058186