yh1008 / speech-to-text

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
http://llcao.net/cu-deeplearning17/project.html
70 stars 19 forks source link

utt2spk not sorted #6

Closed yh1008 closed 7 years ago

yh1008 commented 7 years ago

error: utils/validate_data_dir.sh: file data/train/utt2spk is not in sorted order or has duplicates needs to be fixed!

utils/fix_data_dir.sh data/train utils/fix_data_dir.sh: file data/train/utt2spk is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/spk2utt is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/text is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/segments is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/wav.scp is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: file data/train/spk2gender is not in sorted order or not unique, sorting it utils/fix_data_dir.sh: filtered data/train/segments from 36990 to 36940 lines based on filter /tmp/kaldi.VKzR/recordings. utils/fix_data_dir.sh: filtered data/train/wav.scp from 193 to 192 lines based on filter /tmp/kaldi.VKzR/recordings.

utt2spk is not in sorted order (fix this yourself)

yh1008 commented 7 years ago

[possible solutions]http://kaldi-asr.org/doc/data_prep.html export LC_ALL=C

yh1008 commented 7 years ago

[speaker-id should be a prefix of the utterance-id] https://sourceforge.net/p/kaldi/discussion/1355347/thread/1012cc7b/

THIS IS THE RIGHT CAUSE AND SOLUTION!