Closed pguyot closed 5 years ago
Good progress! Do you have any WER results for French already?
Not yet. Voxforge and M AILABS are the first two corpora for which I reviewed the transcripts, and through this review I adapted the tokenizer and created this large IPA lexicon. I mean to also include Mozilla Common Voice and, more interestingly for my purposes, TCOF which include children/adults conversations.
Eventually finished a tdnn_250 model based on two corpora. On a Tesla T4, learning rate was about 1 iteration per 50 seconds.
%WER 17.40 [ 5018 / 28843, 356 ins, 755 del, 3907 sub ] exp/nnet3_chain/tdnn_250/decode_test/wer_9_0.0
congratulations on your first french model! :) are you planning to run a tdnn_f model / adapt to larger LMs as well? would be interesting to see how much WER improves.
do you have model stats? how many hours of french training material, dict size etc?
Initial support for French with IPA lexicon and cleaned transcripts for Voxforge and M AILABS corpora. Also includes instructions to use Est Republicain corpus to train a language model.