kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.23k stars 5.32k forks source link

Missing utt in making mfcc and pitch feature #3748

Closed ben-8878 closed 4 years ago

ben-8878 commented 4 years ago

Dear maintenance staff:

When i first use "steps/make_mfcc_pitch" to extract mfcc and pitch feature, it works well. But I second use "steps/make_mfcc_pitch" to extract mfcc and pitch feature on same data, 20 percent of data's feature were extracted failed. I think it maybe a bug? but don't know the reason, detail info is as follows:

WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_102_454.26_458.195 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_105_464.77_466.355 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
VLOG[2] (compute-mfcc-feats[5.5]:main():compute-mfcc-feats.cc:173) Processed features for key 20150726_105645_108_486.2_491.045
VLOG[2] (compute-mfcc-feats[5.5]:main():compute-mfcc-feats.cc:173) Processed features for key 20150726_105645_11_31.23_34.045
WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_108_486.2_491.045 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_11_31.23_34.045 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
VLOG[2] (compute-mfcc-feats[5.5]:main():compute-mfcc-feats.cc:173) Processed features for key 20150726_105645_123_569.13_572.085
VLOG[2] (compute-mfcc-feats[5.5]:main():compute-mfcc-feats.cc:173) Processed features for key 20150726_105645_125_581.34_582.845
WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_123_569.13_572.085 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
WARNING (paste-feats[5.5]:main():paste-feats.cc:137) Missing utt 20150726_105645_125_581.34_582.845 from input ark,s,cs:compute-kaldi-pitch-feats --verbose=2 --config=conf/pitch.conf scp,p:data/make_mfcc/train/wav_train.1.scp ark:- | process-kaldi-pitch-feats  ark:- ark:- |
ben-8878 commented 4 years ago

Recently, I do a large number of experiments and have found that it may miss utt when using "steps/make_mfcc_pitch". That drives me to reopen the problem,maybe its a bug.
when i used "steps/make_mfcc.sh", that nerver happend. anyone can answer me?

danpovey commented 4 years ago

You'll have to fid more details about where exactly those utterances go missing, e.g. are they in those scp files like data/make_mfcc/train/wav_train.1.scp ? Are they are in (the data-dir)/splitN/1/utt2spk?

ben-8878 commented 4 years ago

i use "fix_data_dir.sh" so it not in "data/make_mfcc/train/wav_train.1.scp" and "splitN/1/utt2spk" ; it just exists in ".backup/wav.scp" ".backup/utt2spk"

ben-8878 commented 4 years ago

i have checked and find the main reason that making mfcc and pitch failtrue is that pitch feature have been extracted failed. first time making mfcc and pitch failtrue, but maybe second time it success the situation of miss utt maybe happen when wav numbers are large

danpovey commented 4 years ago

You'll need to do some more debugging yourself to figure out what went wrong.

ben-8878 commented 4 years ago

part log is as follows:

VLOG[2] (compute-mfcc-feats[5.5.546~1-bf0ee]:main():compute-mfcc-feats.cc:182) Processed features for key T36424G00021S1006
WARNING (paste-feats[5.5.546~1-bf0ee]:AppendFeats():paste-feats.cc:45) Length mismatch 715 vs. 712 for utt T36424G00021S1006 exceeds tolerance 2
danpovey commented 4 years ago

Possibly you changed the MFCC options or the pitch options and the difference in frame width was enough to cause a mismatch. You might have to either change the tolerance or reduce the difference in frame width.

On Fri, Apr 3, 2020 at 10:59 AM v-yunbin notifications@github.com wrote:

part log is as follows: VLOG[2] (compute-mfcc-feats[5.5.5461-bf0ee]:main():compute-mfcc-feats.cc:182) Processed features for key T36424G00021S1006 WARNING (paste-feats[5.5.5461-bf0ee]:AppendFeats():paste-feats.cc:45) Length mismatch 715 vs. 712 for utt T36424G00021S1006 exceeds tolerance 2

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3748#issuecomment-608202637, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOY55ZDALXCHDQWD4FLRKVGJTANCNFSM4JTTIKMQ .

ben-8878 commented 4 years ago

I get it, thank danpovey‘s reply

maggieezzat commented 3 years ago

@v-yunbin have you figured out the problem?

wangsheng0131 commented 3 years ago

i have checked and find the main reason that making mfcc and pitch failtrue is that pitch feature have been extracted failed. first time making mfcc and pitch failtrue, but maybe second time it success the situation of miss utt maybe happen when wav numbers are large

Hello, @v-yunbin are you sure that the reason for the extraction failure is that wav numbers are large? I encounter this problemthis problem recently, so I want to ask how you solved this problem。Thank you

JasonDmuScut commented 2 years ago

If you change the default configure of MFCC_PITCH feature extraction, remember write down your new feature configure in conf/pitch.conf at the sametime. DO NOT only write them in mfcc.conf.

LijingDK commented 1 year ago

Hello, can you tell me how you solved this problem? I also encountered a similar problem, exactly the same as yours, and it bothered me very much.