Open LiangTing1 opened 4 months ago
Hi
Yes. we used the first 20 dimensions of the target mel, and we used the weight of 45 for this loss.
loss_prosody = (torch.sum(torch.abs(mel[:,:hps.model.prosody_size,:] - prosody_hat.float())*mask) / (torch.sum(mask) * hps.model.prosody_size)) * hps.train.c_mel
Thanks!
Tanks very much for your response. Would it be possible for you to show the loss curve? this is my training loss , the sixth is prosody encoder loss, Does this value seem a bit large?
Hi
Here is our tensorboard logs
green is from the scratch with LibriTTS 460, and orange is from the green with Full-dataset.
Tanks very much for your response. Would it be possible for you to show the loss curve? this is my training loss , the sixth is prosody encoder loss, Does this value seem a bit large?
Can you kindly share you training code for the hierarchical speech synthesizer ?
Hi, Is the output of the prosody encoder in Hierarchical speech synthesizer only used to calculate the loss with the first 20 dimensions of the target mel? What is the weight assigned to this loss?