kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.25k stars 5.32k forks source link

Training failing for custom egs #4758

Open ExarchD opened 2 years ago

ExarchD commented 2 years ago

Hi, I've been using Kaldi in a way that probably isn't supported. I'm trying to train a dnn using a sequence of vectors as an input and a target of a label (0,1 or 2). Borrowing from the xvector training code I tried to train a raw dnn, however, the training itself doesn't converge. My best guess is that my indexes might be wrong on my sequences of input vectors.

ExarchD commented 2 years ago

Can anyone explain to me how the indexes word when it comes to training a DNN? I see that for x-vector training the n index is set to 0 here

danpovey commented 2 years ago

The n index generally corresponds to the batch index. In that code they are set to 0 for output just as a convention, because they are written individualy and not as part of a batch.

Regarding the non-convergence, it would probably be necessary to know much more details to be able to say anything. E.g. training logs; model topology.

ExarchD commented 2 years ago

Ah, I see. Thank you

The model topology is directly copied from the x-vector training code at the moment (a few tdnns, stats pool, tdnns, softmax). In my case, it's important that it's a sequence of vectors rather than just a set, there's time dependence. That should be the same as the x-vector training, correct?

I did have to set the minibatch size to 1, and I'm not sure what the issue was there. I get this error if minibatch is >1:

LOG (nnet3-merge-egs[5.5.989~9-2b5036]:PrintAggregateStats():nnet-example-utils.cc:1155) Processed 83 egs of avg. size 640.7 into 0 minibatches, discarding 100% of egs. Avg minibatch size was -nan, #distinct types of egs/minibatches was 71/0

Is it alright for the minibatch size to be 1?

I'm happy to attach any training logs that may be interesting, however I haven't seen a lot of relevant information in there yet. Maybe these stats are helpful?

LOG (nnet3-merge-egs[5.5.989~9-2b5036]:PrintSpecificStats():nnet-example-utils.cc:1189) 95={1->1,d=0},99={1->1,d=0},165={1->1,d=0},167={1->1,d=0},211={1->1,d=0},217={1->1,d=0},235={1->1,d=0},259={1->1,d=0},265={1->1,d=0},273={1->1,d=0},275={1->1,d=0},283={1->1,d=0},287={1->1,d=0},291={1->1,d=0},301={1->1,d=0},313={1->1,d=0},321={1->2,d=0},345={1->1,d=0},349={1->3,d=0},351={1->1,d=0},361={1->1,d=0},365={1->1,d=0},367={1->1,d=0},379={1->2,d=0},381={1->1,d=0},385={1->1,d=0},401={1->1,d=0},407={1->2,d=0},413={1->1,d=0},417={1->1,d=0},421={1->1,d=0},445={1->1,d=0},447={1->1,d=0},459={1->1,d=0},491={1->1,d=0},499={1->1,d=0},519={1->1,d=0},525={1->1,d=0},539={1->1,d=0},543={1->1,d=0},553={1->1,d=0},565={1->1,d=0},587={1->1,d=0},601={1->3,d=0},603={1->1,d=0},621={1->1,d=0},623={1->1,d=0},627={1->1,d=0},629={1->1,d=0},639={1->1,d=0},649={1->1,d=0},653={1->1,d=0},687={1->1,d=0},745={1->1,d=0},751={1->2,d=0},879={1->1,d=0},887={1->1,d=0},913={1->1,d=0},935={1->1,d=0},971={1->1,d=0},1029={1->1,d=0},1061={1->1,d=0},1147={1->1,d=0},1151={1->1,d=0},1199={1->3,d=0},1201={1->2,d=0},1203={1->2,d=0},1477={1->1,d=0},2319={1->1,d=0},2393={1->1,d=0},2457={1->1,d=0} LOG (nnet3-merge-egs[5.5.989~9-2b5036]:PrintAggregateStats():nnet-example-utils.cc:1155) Processed 83 egs of avg. size 640.7 into 83 minibatches, discarding 0% of egs. Avg minibatch size was 1, #distinct types of egs/minibatches was 71/71

stale[bot] commented 2 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.