k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
534 stars 107 forks source link

streaming_pruned_transducer_statelessX removed #409

Open kolyaflash opened 1 year ago

kolyaflash commented 1 year ago

I'm currently using sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py with all the underlying c++ code for modified_beam_search (RnntConformerModel, StreamingModifiedBeamSearch) with some custom changes. In https://github.com/k2-fsa/sherpa/pull/404 a lot of code has been removed, including that one.

If I understand it correctly, latest version in master still has all the implementations, including modified beam search, but all python bindings except fast beam search are taken out.

Did you remove it because you're focusing on c++ inference and having python mostly as a fast and easy way to try things out? Or are you planning to focus on triton or sherpa-onnx inference in the future so we better migrate to it? What strategy would you recommend me and users who are in a similar situation?

csukuangfj commented 1 year ago

If I understand it correctly, latest version in master still has all the implementations

Yes, you are right.

Did you remove it because you're focusing on c++ inference and having python mostly as a fast and easy way to try things out?

They are removed because we have redesigned the API to make it more modular and easier to extend.

I'm currently using sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py

You can use https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/streaming_server.py to replace sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py.

sherpa/bin/streaming_server.py supports all streaming models from icefall.

If you want to use a non-streaming model, then please use https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/offline_transducer_server.py or https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/offline_ctc_server.py

Or are you planning to focus on triton or sherpa-onnx inference in the future so we better migrate to it? What strategy would you recommend me and users who are in a similar situation?

We will support all we have at present, e.g., sherpa-onnx, sherpa, triton, sherpa-ncnn. You can choose one that is most suitable for you.