Closed RedmondY closed 5 years ago
@vimal @sw005320 @siddalmia any idea how to fix this?
@danpovey I find that this is not the problem of u05. I add in 'utils/data/fix_data_dir.sh $data' in egs/chime5/s5b/utils/validate_data_dir.sh and this issue will not arise.
'sh utils/data/fix_data_dir.sh $data check_sorted_and_uniq $data/utt2spk
if ! $no_spk_sort; then ! cat $data/utt2spk | sort -k2 | cmp -s - $data/utt2spk && \ echo "$0: utt2spk is not in sorted order when sorted first on speaker-id " && \ echo "(fix this by making speaker-ids prefixes of utt-ids)" && exit 1; fi '
May not have been a bug / not enough details to tell, anyway.
I have a similar bug,can you help me?
local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]
I have a similar bug,the main problem is that utt2spk and utt2dur can't match up, utts=4620 versus utt2dur=5607,check these files
I have a similar bug,can you help me?
local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]
Is your problem get solved? Please let me know if any solution you have done? I have also similar error like this.
These look like issues arising from incorrect data preparation (i.e., not bugs). It is likely that you run some stages multiple times. Try starting from a clean slate and making sure the data is prepared correctly before you move to the feature generation stage. Remove all data dirs prepared previously, run the timit_data_prep.sh
script, and then run utils/data/validate_data_dir.sh
to make sure the data dir is correct. If it is not, please show the command line output here.
These look like issues arising from incorrect data preparation (i.e., not bugs). It is likely that you run some stages multiple times. Try starting from a clean slate and making sure the data is prepared correctly before you move to the feature generation stage. Remove all data dirs prepared previously, run the
timit_data_prep.sh
script, and then runutils/data/validate_data_dir.sh
to make sure the data dir is correct. If it is not, please show the command line output here.
Thanks for reply, i have checked and found that i am using old data/train_sp folder that contains old files, so i deleted that folder and after running it will be created.
Now i am stuck at new problem trying to solve it if not, will let you know 🙂.
Thanks for helping.
maybe you should check your utt2spk file and dataset dir,it seams like the file name in your utt2spk can't match up with the file name in your dataset dir
------------------ 原始邮件 ------------------ 发件人: "kaldi-asr/kaldi" @.>; 发送时间: 2022年3月22日(星期二) 晚上9:21 @.>; 抄送: "@@.**@.>; 主题: Re: [kaldi-asr/kaldi] Bug in egs/chime5/s5b/run.sh stage 4 (#3448)
I have a similar bug,can you help me?
local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]
Is your problem get solved? Please let me know if any solution you have done? I have also similar error like this.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
When I run stage 4, I met following problem:
Prehaps the deletion of U05 caused the problem? (issue)