k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
802 stars 270 forks source link

Export conformer_ctc3 streaming model to jit trace #999

Open pavankumar-ds opened 1 year ago

pavankumar-ds commented 1 year ago

Could you please add support for streaming model export to jit trace format?

uni-manjunath-ke commented 1 year ago

Hi @csukuangfj, Could you please let us know any update on this? Thanks

csukuangfj commented 1 year ago

Hi @csukuangfj, Could you please let us know any update on this? Thanks

Sorry, conformer_ctc3 does not support streaming recognition. please switch to streaming zipformer if possible.

https://github.com/k2-fsa/icefall/pull/941 is for zipformer + ctc

uni-manjunath-ke commented 1 year ago

Thanks @csukuangfj . We are using zipformer_ctc implementation at https://github.com/k2-fsa/icefall/pull/941 . Hope this is streaming variant and it supports streaming. Pls confirm. Thank you.

csukuangfj commented 1 year ago

@uni-manjunath-ke

941 combines zipformer with ctc, however, it is not streaming.

please use pruned_transducer_stateless7_streaming, which is a streaming version. If you want to use CTC, please combine pruned_transducer_stateless7_streaming with #941

uni-manjunath-ke commented 1 year ago

Thanks @csukuangfj ..

How do I do that? Is it something like the difference between pruned_transducer_stateless7 and zipformer_ctc to be understood and then adapted to pruned_transducer_stateless7_streaming ? Is this correct. Pls suggest. Thanks. Tagging @pavankumar-ds

csukuangfj commented 1 year ago

pruned_transducer_stateless7 and zipformer_ctc share the same zipformer.py

Please first have a look at pruned_transducer_stateless7_streaming. After you read the code, I believe you will know it.

uni-manjunath-ke commented 1 year ago

Hi @csukuangfj and @desh2608 , We had gone through the code, and we felt that it might take some time by us to understand it thoroughly, and implement it by ourselves. But, meanwhile just wanted to check, if you have any plan to implement this streaming version of zipformer_ctc. Considering this as a request, is it possible to implement this streaming version of zipformer_ctc?. Thanks.

desh2608 commented 1 year ago

Sorry, I don't have the bandwidth for this at the moment. As Fangjun mentioned, it should be relatively straightforward to create such a recipe based on pruned_transducer_stateless7_streaming and zipformer_ctc. You can basically just copy over the zipformer_ctc files. Then change the following:

  1. Replace zipformer.py with the one from pruned_transducer_stateless7_streaming. This is the streaming variant of Zipformer.
  2. In train.py, add the "chunk" related arguments (see here). Also search for "chunk" in that file and add all those things to train.py in zipformer_ctc.
  3. Similarly, look for "chunk" in decode.py of pruned_transducer_stateless7_streaming, and add those in decode.py of zipformer_ctc.

I think these are all the changes needed.

uni-manjunath-ke commented 1 year ago

Sure, Thanks a lot for detailed steps. We will work on it and update you further.

uni-manjunath-ke commented 1 year ago

Hi @desh2608 & @csukuangfj , Thanks for your suggestions. We were able to make suggested modifications and train a model for zipformer ctc streaming. Could you please let us know if we can push this zipformer_ctc_streaming recipe to repository. Thanks

uni-saurabh-vyas commented 1 year ago

Is there any recipe in sherpa triton for zipformer ctc streaming ?

uni-manjunath-ke commented 1 year ago

Hi @csukuangfj and @desh2608 , We tried to export zipformer_ctc_streaming to jit format. But, we are getting below errors. We also tried to import it to onnx format, and the code changes that we made for export_onnx.py were gave some errors. Could you please suggest further on this. Thanks.

build/sherpa/./bin/sherpa-online --nn-model=/mnt/efs/manju/if/icefall/egs/librispeech/ASR/zipformer_ctc_streaming/exp/cpu_jit.pt --tokens=/mnt/efs/manju/if/icefall/egs/librispeech/ASR/data/in_en/lang_bpe_500/./tokens.txt --use-gpu=true --decoding-method=greedy_search /mnt/efs/manju/if/tools/16pc m_re_testq_vNeC-nX4X0LWuORXDmp_l_0001.wav [I] /mnt/efs/manju/if/tools/sherpa/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char const) 2023-05-11 08:17:26.563 build/sherpa/./bin/sherpa-online --nn-model=/mnt/efs/manju/if/icefall/egs/librispeech/ASR/zipformer_ctc_streaming/exp/cpu_jit.pt --tokens=/mnt/efs/manju/if/icefall/egs/librispeech/ASR/data/in_en/lang_bpe_500/./tokens.txt --use-gpu=true --decoding-method=greedy_search /mnt/efs/manju/if/tools/16pcm_re_testq_vNeC-nX4X0LWuORXDmp_l_0001.wav

[I] /mnt/efs/manju/if/tools/sherpa/sherpa/cpp_api/bin/online-recognizer.cc:145:int32_t main(int32_t, char**) 2023-05-11 08:17:26.567 decoding method: greedy_search

Aborted (core dumped)

csukuangfj commented 1 year ago

We tried to export zipformer_ctc_streaming to jit format.

build/sherpa/./bin/sherpa-online

k2-fsa/sherpa supports only streaming transducers. If you could contribute a streaming CTC model, we can add that to k2-fsa/sherpa.

uni-saurabh-vyas commented 1 year ago

Yes, I think onnx recipe for sherpa with triton for ctc zipformer would be nice to have.

csukuangfj commented 1 year ago

Yes, I think onnx recipe for sherpa with triton for ctc zipformer would be nice to

We currently don't have such a recipe in icefall. Would you mind contributing one to icefall and make the model public so that we can use it for testing when adding it to sherpa?

uni-manjunath-ke commented 1 year ago

upports only streaming transducers. If you could contribute a streaming CTC model, we can add that to k2-fsa/sherpa

If a share a zipformer ctc streaming model . will that be fine?. Thanks

csukuangfj commented 1 year ago

We want more people to benefit from the code. If we only have a pre-trained model, then users (other than you) won't have code to train their own model and the code is mostly only can be used by you.

pavankumar-ds commented 1 year ago

Adding to @uni-manjunath-ke 's points, yes, we'd also like to add the recipe to icefall. We'll take a few days to run it with standard librispeech and include the WER.

uni-manjunath-ke commented 1 year ago

We want more people to benefit from the code. If we only have a pre-trained model, then users (other than you) won't have code to train their own model and the code is mostly only can be used by you.

Sure, Could you please guide us on how to we push our zipformer-ctc-streaming code. Thanks

csukuangfj commented 1 year ago

Could you follow https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request to make a pull-request to icefall?

uni-manjunath-ke commented 1 year ago

Hi @csukuangfj , I have created a fork and uploaded zipformer_ctc_streaming at https://github.com/uni-manjunath-ke/icefall/tree/zipformer_ctc_streaming/egs/librispeech/ASR/zipformer_ctc_streaming

We have also ran this code for Librispeech. These are our results: using avg 15 WER 10.51% test-other WER 4.07% test-clean

using avg 9 WER 10.30% test-other WER 4.0% test-clean

Please let us know further steps. Thanks

csukuangfj commented 1 year ago

We have also ran this code for Librispeech. These are our results:

  1. How much data have you used to train the model? train-clean-100 or the full librispeech (960 hours)?
  2. How many epochs have you run? Are the posted number the best after searching different combinations of --epoch --avg?
  3. Which decoding method are you using?
  4. Could you make a pull request first?
  5. Could you update RESULTS.md to include your results? You can find in RESULS.md the information you need to fill in by following other folders.

Thanks!

uni-manjunath-ke commented 1 year ago

Thanks. Updated RESULTS.md and created a pull request at https://github.com/k2-fsa/icefall/pull/1106