librispeech data validation issue

shadowoom commented 5 years ago

capture

I was running the librispeech s5 example and followed the steps in run.sh. Has anyone encountered similar issue? Can tell me how to resolve this?

danpovey commented 5 years ago

I can't see a reference to make_fbank.sh in any of the scripts in the librispeech s5 directory. Are you using an up-to-date version of Kaldi (if not, how recent?) and are you using Kaldi from the official source? Have you changed the scripts, perhaps?

shadowoom commented 5 years ago

Thank you for the reply.

I directly cloned from this repo, https://github.com/tramphero/kaldi. The Alibaba_MIT_Speech_DFSMN patch which was built on the Kaldi speech recognition toolkit with commit "04b1f7d6658bc035df93d53cb424edc127fab819", was applied.

The run_fsmn_ivector script in librispeech/s5/local/nnet/ folder can be found here: https://github.com/tramphero/kaldi/blob/master/egs/librispeech/s5/local/nnet/run_fsmn_ivector.sh

I checked the wav.scp file and reco2dur file in train_960_cleaned folder, the length were already differed before they were copied into the data_fbank folder.

danpovey commented 5 years ago

It may be it is from a bug that was in an older copy of Kaldi at some point; merging with master might resolve it (but it might not). I'm not going to provide support for that external repo.

On Sun, Jul 22, 2018 at 8:52 PM, shadowoom notifications@github.com wrote:

Thank you for the reply.

I directly cloned from this repo, https://github.com/tramphero/kaldi. The Alibaba_MIT_Speech_DFSMN patch which was built on the Kaldi speech recognition toolkit with commit "04b1f7d6658bc035df93d53cb424edc127fab819", was applied.

The run_fsmn_ivector script in librispeech/s5/local/nnet/ folder can be found here: https://github.com/tramphero/kaldi/blob/master/egs/ librispeech/s5/local/nnet/run_fsmn_ivector.sh

I checked the wav.scp file and reco2dur file in train_960_cleaned folder, the length were already differed before they were copied into the data_fbank folder.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2566#issuecomment-406931727, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu9UklPqIdx8eOPVT7KtYVPVMxjqiks5uJUh4gaJpZM4VaOhu .

shadowoom commented 5 years ago

Okay, thanks, I will try to merge with master and see how.

yangxueruivs commented 5 years ago

I also met this problem before. I just ignore the data validation part of wav.scp and reco2dur(last one) in utils/validation_data_dir.sh and then it works correctly.

shadowoom commented 5 years ago

Thanks, that's really helpful.

On Mon, Jul 23, 2018 at 2:34 PM, YangXuerui notifications@github.com wrote:

I also met this problem before. I just ignore the data validation part of wav.scp and reco2dur(last one) in utils/validation_data_dir.sh and then it works correctly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2566#issuecomment-406954286, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsquRDekAqH2OddfBo3NCQ3wwtQtWmvks5uJW5ugaJpZM4VaOhu .

AlexPeng19 commented 5 years ago

@shadowoom hi, i trained the dfsmn model too and encountered the same issue, did you resolved it now by skipping the validation line? @yangxueruivs could you provide some detail for skip, such as which file, line numbers. on the other hand, did you exported the trained model? i mean how did you used it.

looking forward your response. thanks in advance.

danpovey commented 5 years ago

utils/fix_data_dir.sh may help.

This is likely an issue in the branch of Kaldi that you are using, which is obviously not the master... you could comment on the repo where it's hosted and ask them to help or to fix it.

On Mon, Sep 3, 2018 at 2:54 AM AlexPeng19 notifications@github.com wrote:

@shadowoom https://github.com/shadowoom hi, i trained the dfsmn model too and encountered the same issue, did you resolved it now by skipping the validation line? @yangxueruivs https://github.com/yangxueruivs could you provide some detail for skip, such as which file, line numbers. on the other hand, did you exported the trained model? i mean how did you used it.

looking forward your response. thanks in advance.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2566#issuecomment-418017980, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu9FDJ4CjHJt9C-lGmwjonmrcxb1lks5uXNIIgaJpZM4VaOhu .

kaldi-asr / kaldi

librispeech data validation issue #2566