sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.13k stars 134 forks source link

Prosody Encoder #35

Open LiangTing1 opened 4 months ago

LiangTing1 commented 4 months ago

Hi, Is the output of the prosody encoder in Hierarchical speech synthesizer only used to calculate the loss with the first 20 dimensions of the target mel? What is the weight assigned to this loss?

sh-lee-prml commented 4 months ago

Hi

Yes. we used the first 20 dimensions of the target mel, and we used the weight of 45 for this loss.

        loss_prosody = (torch.sum(torch.abs(mel[:,:hps.model.prosody_size,:] - prosody_hat.float())*mask) / (torch.sum(mask) * hps.model.prosody_size)) * hps.train.c_mel

Thanks!

LiangTing1 commented 4 months ago

Tanks very much for your response. Would it be possible for you to show the loss curve? this is my training loss , the sixth is prosody encoder loss, Does this value seem a bit large?

image
sh-lee-prml commented 4 months ago

Hi

Here is our tensorboard logs

image

green is from the scratch with LibriTTS 460, and orange is from the green with Full-dataset.

hoangtm-aimesoft commented 3 months ago

Tanks very much for your response. Would it be possible for you to show the loss curve? this is my training loss , the sixth is prosody encoder loss, Does this value seem a bit large? image

Can you kindly share you training code for the hierarchical speech synthesizer ?