Closed panxin801 closed 2 years ago
Hi, to reproduce the results in the paper, please first set the strategy to pretrain to train only on the LibriSpeech corpus, then set the strategy to finetune to train only on the Buckeye corpus. This aims to make the training process same to the training process of MFA.
However, in practise, we use models trained with the semi strategy for inference to avoid overfitting.
In fact I trained the model with semi strategy using LJspeech and buckeye , but I did't have the perfect result like paper said , I'm wondering what's the problem in my experiment. Do you have any suggestions ?
LJspeech is a single speaker corpus which is not quite suitable to be the corpus for pretraining.
I never try to train with LJspeech and buckeye. So I can't tell if there is a problem in your experiments.
For the model trained on LibriSpeech and Buckeye with the semi strategy, the results should be slightly worse than the results from the finetuned model but still better than MFA a little.
Oh, thank you for the reply sir, last question is that in paper you have mentioned Each NeuFA model is firstly trained on the full set of the LibriSpeech [22] corpus for 120,000
, here whether full set of librispeech is made of dev-clean+test-clean+train-clean-360
these three parts ? I find there are many different parts from http://www.openslr.org/12/
The full set contains every parts in that link, including:
Thank you sir,I know that, thanks a lot, thanks.
Hello, thank you for you great works, and I have a question about this. In the paper you suggest pretrain with librispeech and then train with buckeye , but in the code no matter semi or semi2 strategy, seems they are all training with librispeech and buckeye at the same time. I'm wonder if training with librispeech first as a pretrain model and then training with buckeye can achieve a better result as paper said.