jzlianglu / pykaldi2

Yet another speech toolkit based on Kaldi and PyTorch
MIT License
173 stars 33 forks source link

How to use CE regularizer in chain-model training? #11

Open glynpu opened 4 years ago

glynpu commented 4 years ago

Hi, I wander is it appropriate to add CE regularizer(_grad_xent) directly to grad_ in chain model training? As implemented as: grad.add_mat(chain_opts.xent_regularize, grad_xent) .

In kaldi's chain model recipe, e.g. aishell s5. The network architecture has two branches after layer tdnn6, one for chain-model(output layer), the other one for CE(output-xent layer). Derivative matrix grad is applied to "output", while _grad_xent is applied to "output-xent_".
If _gradxent is merged into grad, there will be no prefinal-xent--> output-xent branch at all.

image

jzlianglu commented 4 years ago

That is a good question. I missed that. Indeed, for the current implementation, the CE regularization does not work in my experiments. It is a bit unclear to me why it needs another branch for CE regularization. For the lattice-based sequence training, I did not use the 2nd branch, and CE regularization worked well. I'll do some comparison, and will update the code. Thanks.