gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 86 forks source link

retrain existing nnet3 model with more data #106

Closed cogmeta closed 4 years ago

cogmeta commented 4 years ago

Is there a way to retrain existing nnet3 mode with additional data and avoid training everything from scratch again? we have kaldi nnet3 model training on 3k hours of data. we have got additional 3k of data and would like retain the existing model rather than starting everything from scratch?

gooofy commented 4 years ago

Yes, kaldi does support transfer learning (which is effectively what you're trying to do here, I assume) - please check the kaldi-help group for details.

dophist commented 4 years ago

@cogmeta you may have a look at this script for reference: https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell2/s5/local/nnet3/tuning/finetune_tdnn_1a.sh . But keep in mind fine-tuning is a trade-off between what's "old" and "new", mix new data with some proportion of your old data if necessay.

cogmeta commented 4 years ago

Actually, the additional amount of data that will be added is more than original data. I am guessing training from scratch might be good idea.