k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
950 stars 300 forks source link

can't reproduce streaming_conformer_ctc result #909

Open BuaaAlban opened 1 year ago

BuaaAlban commented 1 year ago

Hi @glynpu , Could you share your config to reproduce your trained model? I can't reproduce the result using the default values in streaming_conformer_ctc/train.py , after training for 46 epochs and decode using it with avg=20, acutually there is almost no acc with the config BTW, there are some small bugs like missing ids in the result tuple and dataloader error when using my trained model to decode streaming_conformer_ctc/streaming_decode.py

glynpu commented 1 year ago

Thanks for the report.

there are some small bugs like missing ids in the result tuple and dataloader error

Q1: Is it possible to decode with these trained models with your dataset? https://huggingface.co/GuoLiyong/streaming_conformer/tree/main/streaming_models

Q2: Do you have some special reasons to use this ctc recipe? Since we already shift to RNN_T loss, so we are not working on this direction recently. But if you really need a model trained with CTC Loss. We are glad to continue optimize this.

BuaaAlban commented 1 year ago

Q1:I will try your pretrained model. I didn't have my own dataset, i just try to reproduce it on librispeech. BTW, the bugs are interface errors and easy to fix, maybe I can create a PR later.
Q2: Actually I wan't to try ctc recipe becaues its' structure is simpler than RNN-T(even if for pruned & stateless) and maybe easier for deployment, and it's acc can aslo be very good. I reproduced in NeMo a 16layer dim 176, 4 head conformer ctc model on librispeech, and its' wer was 2.6/6.3 on test clean/other.

glynpu commented 1 year ago

maybe I can create a PR later.

Thanks!

and maybe easier for deployment

For RNN-T, our sherpa repo is mainly aiming for RNN-T models deployment, maybe it will help you to deploy an RNN-T model. For CTC models, currently and in the short future, we are not going to support it as much as RNN-T models.

danpovey commented 1 year ago

Some of the older conformer models could sometimes completely fail to converge. For more recent models we have made various changes that make convergence much more stable. The loss should be (0.0x) or (0.00x), if it's, like, 0.5 or 0.8 it means it completely failed to converge.