Open jaykim9870 opened 9 months ago
@jaykim9870 I have the same question. You're thinking that code should be changed like below. Right?
before : return loss + (self.rvq_cross_entropy_loss_weight * ce_loss) + duration_pitch_loss
fixed : return loss + (self.rvq_cross_entropy_loss_weight * ce_loss) + aux_loss
@wonwooo Yes, that would do.
FYI, There are some other issues like wavenet based diffusion model as the model size is very different from the original paper. As far as I have investigated, the model architecture is too different so it may affect the model performance. If you are working based on this project, you may also need to check those out!
Hello, I was looking into your code and it seems like the code does not consider the duration_pitch_loss.
https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1522
Maybe, it might be related to the aux_loss you have made.
https://github.com/lucidrains/naturalspeech2-pytorch/blob/659bec7f7543e7747e809e950cc2f84242fbeec7/naturalspeech2_pytorch/naturalspeech2_pytorch.py#L1600
Thanks for the great work!