kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

Refusing to split data because number of speakers 0 is less than the number of output .scp files 40 #3376

Closed whaozl closed 5 years ago

whaozl commented 5 years ago
[haozl@centos7 s5data8k]$ steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj $njob  exp/mono/graph data/test exp/mono/decode_test
steps/decode.sh --cmd run.pl --config conf/decode.config --nj 40 exp/mono/graph data/test exp/mono/decode_test
utils/split_scp.pl: Refusing to split data because number of speakers 0 is less than the number of output .scp files 40

Must I have speaker identity information when training ASR? I see that train tdnn extracted the ivector feature?

danpovey commented 5 years ago

If you don't have speaker identity you can just have utt2spk and spk2utt be a one to one map. You need to make sure your data directory passes validation (see utils/data/validate_data_dir.sh).

danpovey commented 5 years ago

and you should use kaldi-help for this kind of question.

whaozl commented 5 years ago

thankyou dan. yes, I join the kaldi help group.