kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.32k stars 5.33k forks source link

[egs] Training amount of ivector in librispeech recipe #4573

Open kakushawn opened 3 years ago

kakushawn commented 3 years ago

https://github.com/kaldi-asr/kaldi/blob/9d235864c3105c3b72feb9f19a219dbae08b3a41/egs/librispeech/s5/local/nnet3/run_ivector_common.sh#L86

According to the comment above this data subset script, 200 hours of data will extracted. However, the source data data/${train_set}_sp_hires is speed perturbed, so there should be around 3000 hours in data/${train_set}_sp_hires, therefore 600 hours to train an ivector?

desh2608 commented 3 years ago

The unperturbed cleaned data has about 300k utterances (900k for speed-perturbed version). So 60k is 1/5th of that, hence about 200 hours. But yeah, the comment should make that clearer.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.