tuanh123789 / AdaSpeech

An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
96 stars 27 forks source link

the performance of new voice(fintune) is bad #11

Open linlinsongyun opened 1 year ago

linlinsongyun commented 1 year ago

Thanks for your nice work. The code works well with the pretrain stage. However, when i finetune towards an unseen voice with 10 sentences, the results is bad. The speech quality is bad, and the voice is significantly different. what went wrong?

image
tuanh123789 commented 1 year ago

What dataset do you use in pretrain stage ?

tuanh123789 commented 1 year ago

And is the language in pretrain and finetune the same ?

linlinsongyun commented 1 year ago

A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning.

linlinsongyun commented 1 year ago

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

tuanh123789 commented 1 year ago

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

tuanh123789 commented 1 year ago

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

Only speaker embedding and condition layernorm. I follow the paper

linlinsongyun commented 1 year ago

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

yes. i use the default config "num_speaker: 955". There are 30 speakers in the pretrain stage, whose speaker id are ranging from 1 to 31. And i use speaker_id=50 in the finetune stage.

tuanh123789 commented 1 year ago

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

linlinsongyun commented 1 year ago

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

ok, i will have a try. Thanks a lot.

vedantk-b commented 1 year ago

@linlinsongyun did the finetuning improve after you changed the number of speakers?