jzlianglu / pykaldi2

Yet another speech toolkit based on Kaldi and PyTorch
MIT License
173 stars 33 forks source link

Why is the chain loss computation so slow? #18

Closed bliunlpr closed 4 years ago

bliunlpr commented 4 years ago

When I trained the model by ChainObjtiveFunction, I found the chain loss computation is very slow. For example, the data load time is 0.2s, the model forward time is 0.2s, but the loss computation time is 8.2s. I wonder why the chain loss computation is so slow and how to accelerate it? Thanks!

glynpu commented 4 years ago

Do you enable CUDA like this when invoke ChainObjtiveFunction?

jzlianglu commented 4 years ago

Thanks for pointing out this. I have not fully tested the chain function yet, as I'm working on a few other aspects currently. Currently, it only works with batchsize as 1. I think @glynpu made a good point. It is possible that I have missed this, and hence why it is slow now. To fully utilize the chain objective, I need to implement another dataloader that can prepare minibatches like Kaldi does, e.g., 128 sequences of 1.5 second of audio with supervisions. Currently I do not have time to do that yet, but will work on that as long as I get my hands free.

jzlianglu commented 4 years ago

Hi @bliunlpr

I double checked by my setup, and it takes around 1s to compute the loss for each utterance. Is the 8.2s that you mentioned corresponds to the computation of one utterance?

bliunlpr commented 4 years ago

Thanks for your reply. @glynpu made a good point. I have added the lines to enable CUDA. Now the loss computation time is about 0.5s. Thanks again! @glynpu jzlianglu

jzlianglu commented 4 years ago

@bliunlpr , great, would you like to push your changes to the main branch?

bliunlpr commented 4 years ago

I have added it, its in pull requests. @jzlianglu

jzlianglu commented 4 years ago

@bliunlpr , thanks, will test it.

jzlianglu commented 4 years ago

Added CuDevice activation for LFMMI loss computation for significant speed improvement.