How to use CE regularizer in chain-model training?

jzlianglu / pykaldi2

Yet another speech toolkit based on Kaldi and PyTorch

MIT License

173 stars 33 forks source link

Hi, I wander is it appropriate to add CE regularizer(_grad_xent) directly to grad_ in chain model training? As implemented as: grad.add_mat(chain_opts.xent_regularize, grad_xent) .

In kaldi's chain model recipe, e.g. aishell s5. The network architecture has two branches after layer tdnn6, one for chain-model(output layer), the other one for CE(output-xent layer). Derivative matrix grad is applied to "output", while _grad_xent is applied to "output-xent_".
If _gradxent is merged into grad, there will be no prefinal-xent--> output-xent branch at all.

jzlianglu / pykaldi2

How to use CE regularizer in chain-model training? #11