sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.17k stars 134 forks source link

cant replicate the training curve for libritts 960 #36

Closed meriamOu closed 7 months ago

meriamOu commented 7 months ago

hey, thank you for sharing the training cure. I am training the model on Libritts 960 using exactly your preprocessing and training files. However my training curve doesn t show the same losses you got. while you reached 2.74 w2v loss at 100k (https://github.com/sh-lee-prml/HierSpeechpp/issues/4#issuecomment-1827099857 ,) my training converges at error of 6. The model shows slurries as well

Screenshot 2024-02-04 at 2 02 02 PM

what could be the reason for that ? thank you

sh-lee-prml commented 7 months ago

Oh sorry.

I have noticed a difference.

When I trained the model, I missed loss mask for w2v loss

so after some test, I added a loss mask in trainer as below

            # This is the paper version, and my tensorboard log
            loss_w2v = F.l1_loss(w2v_x, w2v_predicted) * hps.train.c_mel 
             # This is a released version
             loss_w2v = (torch.sum(torch.abs(w2v_x - (w2v_predicted*mask)))/(torch.sum(mask)*1024)) * hps.train.c_mel

Sorry for the confusion.

To be honest, each model is good to me so it is hard to recommend one...

meriamOu commented 7 months ago

hey thank you so much for your immediate rely. The problem is not just about the loss numbers , it is the slurries that deteriorates the speech. I will swap the loss with the loss of the tensorboard log Thank you so much