kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.92k stars 5.29k forks source link

kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607] #4650

Open shf2020 opened 2 years ago

shf2020 commented 2 years ago

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

danpovey commented 2 years ago

I don't know what might have happened here, the TIMIT recipe is old and has not been changed in a while. I guess there must be something odd about your TIMIT data, where did you get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

shf2020 commented 2 years ago

I get it from this web: http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3/tech&hit=1&filelist=1

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 13:52 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue #4650)

I don't know what might have happened here, the TIMIT recipe is old and has not been changed in a while. I guess there must be something odd about your TIMIT data, where did you get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

danpovey commented 2 years ago

It's possible that the data was somehow renamed at some point, and does not correspond to the original TIMIT data. Here is a part of the file list of TIMIT:

TIMIT/TEST/DR3/FPKT0/SA1.PHN

TIMIT/TEST/DR3/FPKT0/SA1.TXT

TIMIT/TEST/DR3/FPKT0/SA1.WAV

TIMIT/TEST/DR3/FPKT0/SA1.WRD

TIMIT/TEST/DR3/FPKT0/SA2.PHN

TIMIT/TEST/DR3/FPKT0/SA2.TXT

TIMIT/TEST/DR3/FPKT0/SA2.WAV

TIMIT/TEST/DR3/FPKT0/SA2.WRD

TIMIT/TEST/DR3/FPKT0/SI1538.PHN

TIMIT/TEST/DR3/FPKT0/SI1538.TXT

TIMIT/TEST/DR3/FPKT0/SI1538.WAV

TIMIT/TEST/DR3/FPKT0/SI1538.WRD

TIMIT/TEST/DR3/FPKT0/SI2168.PHN

TIMIT/TEST/DR3/FPKT0/SI2168.TXT

TIMIT/TEST/DR3/FPKT0/SI2168.WAV

TIMIT/TEST/DR3/FPKT0/SI2168.WRD

.. does your zip file have those files in it?

On Mon, Oct 25, 2021 at 2:20 PM shf @.***> wrote:

I get it from this web:

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3/tech&hit=1&filelist=1

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 13:52 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue

4650)

I don't know what might have happened here, the TIMIT recipe is old and has not been changed in a while. I guess there must be something odd about your TIMIT data, where did you get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650#issuecomment-950567472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5KH6VKLOWRKRAR3GLUITZJTANCNFSM5GUFQNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

shf2020 commented 2 years ago

Yes,I have

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 15:21 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue #4650)

It's possible that the data was somehow renamed at some point, and does not correspond to the original TIMIT data. Here is a part of the file list of TIMIT:

TIMIT/TEST/DR3/FPKT0/SA1.PHN

TIMIT/TEST/DR3/FPKT0/SA1.TXT

TIMIT/TEST/DR3/FPKT0/SA1.WAV

TIMIT/TEST/DR3/FPKT0/SA1.WRD

TIMIT/TEST/DR3/FPKT0/SA2.PHN

TIMIT/TEST/DR3/FPKT0/SA2.TXT

TIMIT/TEST/DR3/FPKT0/SA2.WAV

TIMIT/TEST/DR3/FPKT0/SA2.WRD

TIMIT/TEST/DR3/FPKT0/SI1538.PHN

TIMIT/TEST/DR3/FPKT0/SI1538.TXT

TIMIT/TEST/DR3/FPKT0/SI1538.WAV

TIMIT/TEST/DR3/FPKT0/SI1538.WRD

TIMIT/TEST/DR3/FPKT0/SI2168.PHN

TIMIT/TEST/DR3/FPKT0/SI2168.TXT

TIMIT/TEST/DR3/FPKT0/SI2168.WAV

TIMIT/TEST/DR3/FPKT0/SI2168.WRD

.. does your zip file have those files in it?

On Mon, Oct 25, 2021 at 2:20 PM shf @.***> wrote:

I get it from this web:

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3/tech&hit=1&filelist=1

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 13:52 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue

4650)

I don't know what might have happened here, the TIMIT recipe is old and has not been changed in a while. I guess there must be something odd about your TIMIT data, where did you get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650#issuecomment-950567472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5KH6VKLOWRKRAR3GLUITZJTANCNFSM5GUFQNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

danpovey commented 2 years ago

I would remove utt2dur file and see if the next stages run.

On Mon, Oct 25, 2021 at 6:23 PM shf @.***> wrote:

Yes,I have

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 15:21 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue

4650)

It's possible that the data was somehow renamed at some point, and does not correspond to the original TIMIT data. Here is a part of the file list of TIMIT:

TIMIT/TEST/DR3/FPKT0/SA1.PHN

TIMIT/TEST/DR3/FPKT0/SA1.TXT

TIMIT/TEST/DR3/FPKT0/SA1.WAV

TIMIT/TEST/DR3/FPKT0/SA1.WRD

TIMIT/TEST/DR3/FPKT0/SA2.PHN

TIMIT/TEST/DR3/FPKT0/SA2.TXT

TIMIT/TEST/DR3/FPKT0/SA2.WAV

TIMIT/TEST/DR3/FPKT0/SA2.WRD

TIMIT/TEST/DR3/FPKT0/SI1538.PHN

TIMIT/TEST/DR3/FPKT0/SI1538.TXT

TIMIT/TEST/DR3/FPKT0/SI1538.WAV

TIMIT/TEST/DR3/FPKT0/SI1538.WRD

TIMIT/TEST/DR3/FPKT0/SI2168.PHN

TIMIT/TEST/DR3/FPKT0/SI2168.TXT

TIMIT/TEST/DR3/FPKT0/SI2168.WAV

TIMIT/TEST/DR3/FPKT0/SI2168.WRD

.. does your zip file have those files in it?

On Mon, Oct 25, 2021 at 2:20 PM shf @.***> wrote:

I get it from this web:

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3/tech&hit=1&filelist=1

从 Windows 版邮件发送

发件人: Daniel Povey 发送时间: 2021年10月25日 13:52 收件人: kaldi-asr/kaldi 抄送: shf; Author 主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue

4650)

I don't know what might have happened here, the TIMIT recipe is old and has not been changed in a while. I guess there must be something odd about your TIMIT data, where did you get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train mfcc steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup fix_data_dir.sh: no utterances remained: not proceeding further. utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and utt2dur file utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800 +++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800 @@ -1,4620 +1,5607 @@ -SP0001W00 -SP0001W01 -SP0001W02 ... +SP0462W04-0000-0246 +SP0462W05-0000-0256 +SP0462W06-0000-0391 +SP0462W07-0000-0374 +SP0462W08-0013-0234 +SP0462W09-0000-0314 [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA

. Triage notifications on the go with GitHub Mobile for iOS <

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android <

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650#issuecomment-950567472, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZFLO5KH6VKLOWRKRAR3GLUITZJTANCNFSM5GUFQNKA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4650#issuecomment-950769140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4CYXBYLK5VVMD2JI3UIUV2TANCNFSM5GUFQNKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

shf2020 commented 2 years ago

yes,I have发自我的华为手机-------- 原始邮件 --------发件人: Daniel Povey @.>日期: 2021年10月25日周一 15:21收件人: kaldi-asr/kaldi @.>抄送: shf @.>, Author @.>主 题: Re: [kaldi-asr/kaldi] kaldi [Lengths are /tmp/kaldi.ljtb/utts=4620 versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue #4650)

It's possible that the data was somehow renamed at some point, and does not

correspond to the original TIMIT data.

Here is a part of the file list of TIMIT:

TIMIT/TEST/DR3/FPKT0/SA1.PHN

TIMIT/TEST/DR3/FPKT0/SA1.TXT

TIMIT/TEST/DR3/FPKT0/SA1.WAV

TIMIT/TEST/DR3/FPKT0/SA1.WRD

TIMIT/TEST/DR3/FPKT0/SA2.PHN

TIMIT/TEST/DR3/FPKT0/SA2.TXT

TIMIT/TEST/DR3/FPKT0/SA2.WAV

TIMIT/TEST/DR3/FPKT0/SA2.WRD

TIMIT/TEST/DR3/FPKT0/SI1538.PHN

TIMIT/TEST/DR3/FPKT0/SI1538.TXT

TIMIT/TEST/DR3/FPKT0/SI1538.WAV

TIMIT/TEST/DR3/FPKT0/SI1538.WRD

TIMIT/TEST/DR3/FPKT0/SI2168.PHN

TIMIT/TEST/DR3/FPKT0/SI2168.TXT

TIMIT/TEST/DR3/FPKT0/SI2168.WAV

TIMIT/TEST/DR3/FPKT0/SI2168.WRD

.. does your zip file have those files in it?

On Mon, Oct 25, 2021 at 2:20 PM shf @.***> wrote:

I get it from this web:

http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3/tech&hit=1&filelist=1

从 Windows 版邮件发送

发件人: Daniel Povey

发送时间: 2021年10月25日 13:52

收件人: kaldi-asr/kaldi

抄送: shf; Author

主题: Re: [kaldi-asr/kaldi] kaldi [Lengths are

/tmp/kaldi.ljtb/utts=4620versus /tmp/kaldi.ljtb/utts.utt2dur=5607] (Issue

4650)

I don't know what might have happened here, the TIMIT recipe is old and has

not been changed in a while.

I guess there must be something odd about your TIMIT data, where did you

get it from?

On Mon, Oct 25, 2021 at 10:35 AM shf @.***> wrote:

I have a big bug,can you help me?

local/timit_data_prep.sh: TIMIT data preparation succeeded

steps/make_mfcc.sh --cmd run.pl --nj 8 data/train exp/make_mfcc/train

mfcc

steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup

fix_data_dir.sh: no utterances remained: not proceeding further.

utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted

from utt2spk and utt2dur file

utils/validate_data_dir.sh: differ, partial diff is:

--- /tmp/kaldi.ljtb/utts 2021-10-25 09:36:31.713458090 +0800

+++ /tmp/kaldi.ljtb/utts.utt2dur 2021-10-25 09:36:31.845456523 +0800

@@ -1,4620 +1,5607 @@

-SP0001W00

-SP0001W01

-SP0001W02

...

+SP0462W04-0000-0246

+SP0462W05-0000-0256

+SP0462W06-0000-0391

+SP0462W07-0000-0374

+SP0462W08-0013-0234

+SP0462W09-0000-0314

[Lengths are /tmp/kaldi.ljtb/utts=4620 versus

/tmp/kaldi.ljtb/utts.utt2dur=5607]

You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub

https://github.com/kaldi-asr/kaldi/issues/4650, or unsubscribe

<

https://github.com/notifications/unsubscribe-auth/AAZFLO562G4PZEH25DFLA5LUIS66FANCNFSM5GUFQNKA

.

Triage notifications on the go with GitHub Mobile for iOS

<

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android

<

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub

.

You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub, or unsubscribe.

Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you commented.

Reply to this email directly, view it on GitHub

https://github.com/kaldi-asr/kaldi/issues/4650#issuecomment-950567472,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAZFLO5KH6VKLOWRKRAR3GLUITZJTANCNFSM5GUFQNKA

.

Triage notifications on the go with GitHub Mobile for iOS

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

baowenqian2001 commented 1 year ago

I have the same bug!

image

utils/validate_data_dir.sh: Error: in data/train, utterance-ids extracted from utt2spk and features utils/validate_data_dir.sh: differ, partial diff is: --- /tmp/kaldi.rKQt/utts 2023-02-08 08:58:19.449716943 +0000 +++ /tmp/kaldi.rKQt/utts.feats 2023-02-08 08:58:19.569716779 +0000 @@ -683,3 +683,2 @@ 013090041 -013090051 013090076 ... [Lengths are /tmp/kaldi.rKQt/utts=2500 versus /tmp/kaldi.rKQt/utts.feats=2499] Did you solve it?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

anwarshome commented 5 months ago

SOLVED: The issue is when "local/nnet3/run_ivector_common.sh" runs, it created a copy of "train" and names it "train_sp". All you have to do is run the fix on both train and train_sp. (sudo utils/data/fix_data_dir.sh data/train) and (sudo utils/data/fix_data_dir.sh data/train_sp)

Once you do that, you should not have any problems.

Anwar Tantawy anwar@drtantawy.com