Closed rockiram closed 5 years ago
I have no experience with acoustic model adaptation, so I cannot provide any instructions for this task.
as for training a new model from scratch you should be able to find hints in the README. first step would be to import your dataset (speech_audio_scan.py). once that is than you can either manually or automatically review your corpus and then check for missing lexicon entries. at that point you could also generate noise-augmented corpora from your dataset, should you choose to. with all that in place, you could then export all datasets you want to train your model on to create a kaldi case:
https://github.com/gooofy/zamia-speech#english-nnet3-chain-models
@gooofy Thanks for the suggestion.
I've prepared the zamia-en data as per the instructions in README.
I executed
./speech_kaldi_export.py generic-en-small dict-en.ipa generic_en_lang_model_small voxforge_en librispeech zamia_en cd data/dst/asr-models/kaldi/generic-en-small ./run-chain.sh
I'm currently facing 2 issues:
1. utils/validate_data_dir.sh: empty file spk2utt
make mfcc
fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/make_mfcc.sh --cmd utils/run.pl --nj 12 data/train exp/make_mfcc_chain/train mfcc_chain utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/train steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for train fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/compute_cmvn_stats.sh data/train exp/make_mfcc_chain/train mfcc_chain Succeeded creating CMVN stats for train fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. steps/make_mfcc.sh --cmd utils/run.pl --nj 12 data/test exp/make_mfcc_chain/test mfcc_chain utils/validate_data_dir.sh: empty file spk2utt
As a hack, I copied the files "text", "wav.scp", "utt2spk" files from train dir to test dir. This temporarily suppressed this error. Then the 2nd error (mentioned below) popped up.
2. utils/split_scp.pl: Refusing to split data because number of speakers 1 is less than the number of output .scp files 12
make mfcc
fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/make_mfcc.sh --cmd utils/run.pl --nj 12 data/train exp/make_mfcc_chain/train mfcc_chain utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/train steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for train fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/compute_cmvn_stats.sh data/train exp/make_mfcc_chain/train mfcc_chain Succeeded creating CMVN stats for train fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/train/.backup fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/test/.backup steps/make_mfcc.sh --cmd utils/run.pl --nj 12 data/test exp/make_mfcc_chain/test mfcc_chain utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/test steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for test fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/test/.backup steps/compute_cmvn_stats.sh data/test exp/make_mfcc_chain/test mfcc_chain Succeeded creating CMVN stats for test fix_data_dir.sh: kept all 144 utterances. fix_data_dir.sh: old files are kept in data/test/.backup
mono0a_chain
steps/train_mono.sh --nj 12 --cmd utils/run.pl data/train data/lang exp/mono0a_chain utils/split_scp.pl: Refusing to split data because number of speakers 1 is less than the number of output .scp files 12
Could you please help me in understanding & solving this error?
Thank you!
hi @gooofy This issue is fixed
Help me to understand the second issue mentioned earlier.
Thank you!
I think this is the issue with your dataset:
utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea.
if you're really trying to build a single speaker model I am not sure how to set up kaldi for that - maybe the kaldi user mailing list can help you here.
Thanks @gooofy
I trained the sequitur model with dict-en.ipa file.After training i inferenced the same word in that dict-en.ipa file. The output is entirely different.
Actual output in ipa file - abandonment ʌb'ændʌnmʌnt output predicted by sequitur model6 - abandonment V b ' { n d V n m V n t online ipa converter output - abandonment əˈbændənmənt
Online ipa converter link - https://easypronunciation.com/en/english-phonetic-transcription-converter I tried this because i need to add new words to the ipa file.Kindly help to understand this.
Thank you!
the output of the sequitur model uses the X-SAMPA encoding not IPA
@gooofy Thanks a lot for your suggestions.
As mentioned earlier, I'm trying to setup the acoustic model training pipeline using Zamia. With your help, I've progressed till stage 11 in run-chain.sh.
I'm facing an issue (mentioned below) in stage 11 while training.
It'll be helpful to debug if you can provide few pointers regarding this.
steps/nnet3/chain/get_egs.sh: feature type is raw tree-info exp/nnet3_chain/tdnn_250/tree feat-to-dim scp:exp/nnet3_chain/ivectors_train_sp_hires_comb/ivector_online.scp - steps/nnet3/chain/get_egs.sh: working out number of frames of training data steps/nnet3/chain/get_egs.sh: working out feature dim Command failed (getting feature dim): feat-to-dim "ark,s,cs:utils/filter_scp.pl --exclude exp/nnet3_chain/tdnn_250/egs/valid_uttlist data/train_sp_hires_comb/split12/1/feats.scp | /opt/kaldi/src/featbin/apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_sp_hires_comb/split12/1/utt2spk scp:data/train_sp_hires_comb/split12/1/cmvn.scp scp:- ark:- |" Traceback (most recent call last): File "steps/nnet3/chain/train.py", line 634, in main train(args, run_opts) File "steps/nnet3/chain/train.py", line 395, in train stage=args.egs_stage) File "steps/libs/nnet3/train/chain_objf/acoustic_model.py", line 118, in generate_chain_egs egs_opts=egs_opts if egs_opts is not None else '')) File "steps/libs/common.py", line 158, in execute_command p.returncode, command)) Exception: Command exited with status 1: steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --cmd "utils/run.pl" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "exp/nnet3_chain/ivectors_train_sp_hires_comb" --left-context 16 --right-context 11 --left-context-initial -1 --right-context-final -1 --left-tolerance '5' --right-tolerance '5' --frame-subsampling-factor 3 --alignment-subsampling-factor 3 --stage 0 --frames-per-iter 1500000 --frames-per-eg 150 --srand 0 data/train_sp_hires_comb exp/nnet3_chain/tdnn_250 exp/nnet3_chain/tri2b_chain_train_sp_comb_lats exp/nnet3_chain/tdnn_250/egs
I am a bit suspicious there might have been issues in earlier steps. Did the training of the tri2b_chain model work? did the ivector extraction work?
other than that, you could either try to run the failing steps/nnet3/chain/get_egs.sh manually to get to the bottom of this or contact the kaldi mailing list, maybe someone there knows that would likely cause this command to fail.
This issue is solved. It was the path problem for apply-cmvn. I updated that and as well as i updated in path.sh now it is trained properly.
Thanks for your help @gooofy
Closing this issue.
I prepared the dataset similar to zamia-en dataset with 3000 wav files and prompts. 1.How to fine tune existing acoustic model 2.How to train the acoustic model from scratch