thuhcsi / NeuFA

Neural network-based forced alignment with bidirectional attention mechanism
70 stars 8 forks source link

Which training strategy to use #4

Closed panxin801 closed 2 years ago

panxin801 commented 2 years ago

Hello, thank you for you great works, and I have a question about this. In the paper you suggest pretrain with librispeech and then train with buckeye , but in the code no matter semi or semi2 strategy, seems they are all training with librispeech and buckeye at the same time. I'm wonder if training with librispeech first as a pretrain model and then training with buckeye can achieve a better result as paper said.

petronny commented 2 years ago

Hi, to reproduce the results in the paper, please first set the strategy to pretrain to train only on the LibriSpeech corpus, then set the strategy to finetune to train only on the Buckeye corpus. This aims to make the training process same to the training process of MFA.

However, in practise, we use models trained with the semi strategy for inference to avoid overfitting.

panxin801 commented 2 years ago

In fact I trained the model with semi strategy using LJspeech and buckeye , but I did't have the perfect result like paper said , I'm wondering what's the problem in my experiment. Do you have any suggestions ?

petronny commented 2 years ago

LJspeech is a single speaker corpus which is not quite suitable to be the corpus for pretraining.

I never try to train with LJspeech and buckeye. So I can't tell if there is a problem in your experiments.

For the model trained on LibriSpeech and Buckeye with the semi strategy, the results should be slightly worse than the results from the finetuned model but still better than MFA a little.

panxin801 commented 2 years ago

Oh, thank you for the reply sir, last question is that in paper you have mentioned Each NeuFA model is firstly trained on the full set of the LibriSpeech [22] corpus for 120,000 , here whether full set of librispeech is made of dev-clean+test-clean+train-clean-360 these three parts ? I find there are many different parts from http://www.openslr.org/12/

petronny commented 2 years ago

The full set contains every parts in that link, including:

panxin801 commented 2 years ago

Thank you sir,I know that, thanks a lot, thanks.