k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
483 stars 97 forks source link

[Feature proposal] Support CTC decoding with graph(s) for streaming models #418

Open csukuangfj opened 1 year ago

csukuangfj commented 1 year ago

We plan to add CTC decoding support for streaming models with graph(s) in C++.

As for the models, they need not necessarily come from icefall. As long as there is a torchscript model available (or an ONNX model for sherpa-onnx), we should support it.

As for the graph, it can be an H or an HLG (i.e., TLG). We also need to support using a context graph during the search.

At the very beginning, I suggest we use a streaming zipformer trained by @yaozengwei using transducer + CTC loss and implement a Python version for CTC decoding with graphs as it is easier to debug in Python. After that, we can port the implementation to sherpa and sherpa-onnx.