Closed iprovalo closed 3 months ago
sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k
it is not a streaming model.
You can see that there is no streaming
in the model filename.
Please use a streaming model from https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models
@csukuangfj thank you very much!
Are there streaming multi-lingual models which would include the same languages as the one I mentioned - sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k
?
If I want to try this multi-lingual nemo model, I would need to run SherpaOnnx2Pass example, correct?
What is the offline vs streaming difference? Is the streaming for active microphone, and offline for wav file only?
We have an Android APK for this model.
Please download it from
@csukuangfj thank you!
I was able to build Android APK locally and run this model:
7 -> {
val modelDir = "sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k"
return OfflineModelConfig(
nemo = OfflineNemoEncDecCtcModelConfig(
model = "$modelDir/model.onnx",
),
tokens = "$modelDir/tokens.txt",
)
}
I could not find this model: sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k
in the code. What is the correct configuration for this model?
I tried setting it up as a transducer:
14 -> {
val modelDir = "sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k"
return OfflineModelConfig(
transducer = OfflineTransducerModelConfig(
encoder = "$modelDir/encoder.onnx",
decoder = "$modelDir/decoder.onnx",
joiner = "$modelDir/joiner.onnx",
),
tokens = "$modelDir/tokens.txt",
modelType = "transducer",
)
}
And it crashes with these messages:
2024-01-31 14:19:49.380 24163-24163 sherpa-onnx com.k2fsa.sherpa.onnx I Select model type 14 for ASR
2024-01-31 14:19:49.381 24163-24163 sherpa-onnx com.k2fsa.sherpa.onnx W config:
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k/encoder.onnx", decoder_filename="sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k/decoder.onnx", joiner_filename="sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k/joiner.onnx"), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="en", task="transcribe", tail_paddings=1000), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), telespeech_ctc="", tokens="sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k/tokens.txt", nu
2024-01-31 14:19:52.369 24163-24163 sherpa-onnx com.k2fsa.sherpa.onnx W vocab_size does not exist in the metadata
Please change
modelType = "transducer",
to
modelType = "nemo_transducer",
and everything should work as expected.
@csukuangfj Thank you! Worked like a charm!
By the way,
What is the offline vs streaming difference? Is the streaming for active microphone, and offline for wav file only?
Please refer to
streaming == online, and non-streaming == offline in this context.
Generally speaking, streaming ASR gives you the recognition result as you speak. Non-streaming ASR MUST wait until you have finished speaking before it can start the recognition.
Microphones and wave files are just ways to get audio samples; they are not tied to any algorithms or models. You can consider the two as just different input devices. You can have many other input devices.
@csukuangfj online vs offline makes sense at runtime, but I want to clarify - when training a model, how are streaming vs non-streaming models different? Is it a different architecture?
Thank you!
@csukuangfj I think this is what I was looking for: https://arxiv.org/pdf/2010.14099
I have built and run one example of the ASR successfully for iOS -
getBilingualStreamZhEnZipformer20230220
.I am trying to run this model now:
sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k
I added a method in swift:
I am getting this error:
sherpa-onnx/csrc/online-transducer-nemo-model.cc:InitEncoder:313 window_size does not exist in the metadata
Could you please point me to an example of how to run this model?
Thank you!