laurensw75 / kaldi_egs_CGN

Kaldi recipe for creating Dutch ASR from CGN
7 stars 5 forks source link

Empty segment file when only running on comp-a #6

Closed RobertLevenbach closed 4 years ago

RobertLevenbach commented 4 years ago

When I try to do run.sh with: lang="nl" comp="a"

I get the following error in 'local/cgn_data_prep3.sh $cgn $lang $comp' "Empty list of recording (bad file data/train_t/segments"

Why does this happen? Could it be because of line 49-52 in cgn_data_prep.sh?

grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >train_t.flist grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >train_s.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >dev_t.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >dev_s.flist rm -f temp.flist

Any other idea?

Thanks in advance

laurensw75 commented 4 years ago

The problem is that my script does both 'T' (telephone) and 'S' (studio) recordings separately. If you train without comp-c and comp-d, then there will be no T data, so that part fails. I did not properly check for this in my scripts, so the whole thing crashes. You could remove all the train_t and dev_t bits, but there are quite a few.

RobertLevenbach commented 4 years ago

That clears it up! I will remove them manually, that's fine.

RobertLevenbach commented 4 years ago

How would you suggest altering these lines to work on only comp-a?

grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >train_t.flist grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >train_s.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >dev_t.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >dev_s.flist rm -f temp.flist

i.e. to get a useable train_s and dev_s?

RobertLevenbach commented 4 years ago

I think I found it. Adjust nbest-dev-2008.txt to only comp-a list and make and individual test and dev set