Closed RobertLevenbach closed 4 years ago
The problem is that my script does both 'T' (telephone) and 'S' (studio) recordings separately. If you train without comp-c and comp-d, then there will be no T data, so that part fails. I did not properly check for this in my scripts, so the whole thing crashes. You could remove all the train_t and dev_t bits, but there are quite a few.
That clears it up! I will remove them manually, that's fine.
How would you suggest altering these lines to work on only comp-a?
grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >train_t.flist grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >train_s.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >dev_t.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >dev_s.flist rm -f temp.flist
i.e. to get a useable train_s and dev_s?
I think I found it. Adjust nbest-dev-2008.txt to only comp-a list and make and individual test and dev set
When I try to do run.sh with: lang="nl" comp="a"
I get the following error in 'local/cgn_data_prep3.sh $cgn $lang $comp' "Empty list of recording (bad file data/train_t/segments"
Why does this happen? Could it be because of line 49-52 in cgn_data_prep.sh?
grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >train_t.flist grep -vF -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >train_s.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep 'comp-c|comp-d' | sort >dev_t.flist grep -F -f $local/nbest-dev-2008.txt temp.flist | grep -v 'comp-c|comp-d' | sort >dev_s.flist rm -f temp.flist
Any other idea?
Thanks in advance