how to export the sherpa model torchscript?

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

https://k2-fsa.github.io/sherpa

Apache License 2.0

518 stars 104 forks source link

how to export the sherpa model torchscript? #131

Open lucasjinreal opened 1 year ago

lucasjinreal commented 1 year ago

also, any doc on training Chinese dataset?

csukuangfj commented 1 year ago

Please have a look at https://k2-fsa.github.io/icefall/recipes/index.html

For exporting models, please see https://k2-fsa.github.io/icefall/recipes/librispeech/lstm_pruned_stateless_transducer.html#export-models

All recipes in icefall have a file export.py that can be used to export models for torchscript.

lucasjinreal commented 1 year ago

@csukuangfj Does sherpa support export to PNNX?

csukuangfj commented 1 year ago

@csukuangfj Does sherpa support export to PNNX?

Currently, only the LSTM transducer model supports exporting to PNNX.

Please see https://k2-fsa.github.io/icefall/recipes/librispeech/lstm_pruned_stateless_transducer.html#export-model-for-ncnn for usage.

lucasjinreal commented 1 year ago

@csukuangfj Hi, how does the performance drop compare with LSTM transducer and transformer arch?

lucasjinreal commented 1 year ago

Does any one of these support export to pnnx? Just tested default model perform very well.

csukuangfj commented 1 year ago

@csukuangfj Hi, how does the performance drop compare with LSTM transducer and transformer arch?

The LSTM transducer model is for streaming ASR. It performs the best so far for streaming ASR on the test-clean dataset of LibriSpeech among models trained with icefall.

Does any one of these support export to pnnx? Just tested default model perform very well.

No, none of them supports PNNX. They are conformer-based models and are for non-streaming ASR.

lucasjinreal commented 1 year ago

@csukuangfj Hi, may I ask why does LSTM only do well on streaming task, not other tasks? Why can't conformer beat LSTM on this task

csukuangfj commented 1 year ago

The original Conformer needs to see the whole utterance to compute attention, which is not suitable for streaming ASR.

We have tried chunk-based attention for Conformer in icefall and the experiment result is not as good as LSTM-based models.