k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.72k stars 431 forks source link

Confusing file names #1569

Open dnhkng opened 3 days ago

dnhkng commented 3 days ago

I really appreciate the work you have done in generating onnx models!

I would like to try a few, based on the rankings from the Huggingface ASR leaderboard: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

However, I am having trouble identifying which sherpa-onnx model corresponds to the entries on the leaderboard. I know this is mostly a Nvidia issue, as they have released a lot of models with very similar names.

Could you please help with table of the top models? HF Leaderboard Name Sherpa-ONNX Name
nvidia/canary-1b x
nyrahealth/CrisperWhisper not available ?
nvidia/parakeet-tdt-1.1b x
nvidia/parakeet-rnnt-1.1b x
nvidia/parakeet-ctc-1.1b x
nvidia/parakeet-tdt_ctc-110m sherpa-onnx-nemo-parakeet_tdt_ctc_110m-en-36000.tar.bz2
nvidia/parakeet-rnnt-0.6b x
distil-whisper/distil-large-v3 x (but we have distil-large-v2, you can add v3 if you like)
nvidia/parakeet-ctc-0.6b x
openai/whisper-large-v2 sherpa-onnx-whisper-distil-large-v2.tar.bz2
openai/whisper-large-v3-turbo sherpa-onnx-whisper-turbo.tar.bz2
distil-whisper/distil-large-v2 sherpa-onnx-whisper-distil-large-v2.tar.bz2

The release page has quite a few Nemo models, but I want to use a good one! sherpa-onnx-nemo-ctc-en-citrinet-512.tar.bz2 sherpa-onnx-nemo-ctc-en-conformer-small.tar.bz2 sherpa-onnx-nemo-ctc-en-conformer-medium.tar.bz2 sherpa-onnx-nemo-ctc-en-conformer-large.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-80ms.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-ctc-en-80ms.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-480ms.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-ctc-en-480ms.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-transducer-en-1040ms.tar.bz2 sherpa-onnx-nemo-streaming-fast-conformer-ctc-en-1040ms.tar.bz2 sherpa-onnx-nemo-parakeet_tdt_transducer_110m-en-36000.tar.bz2 sherpa-onnx-nemo-fast-conformer-transducer-es-1424.tar.bz2 sherpa-onnx-nemo-parakeet_tdt_ctc_110m-en-36000.tar.bz2 sherpa-onnx-nemo-fast-conformer-transducer-en-de-es-fr-14288.tar.bz2 sherpa-onnx-nemo-fast-conformer-ctc-es-1424.tar.bz2 sherpa-onnx-nemo-fast-conformer-transducer-en-24500.tar.bz2 sherpa-onnx-nemo-fast-conformer-ctc-en-de-es-fr-14288.tar.bz2 sherpa-onnx-nemo-fast-conformer-transducer-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2 sherpa-onnx-nemo-fast-conformer-ctc-en-24500.tar.bz2 sherpa-onnx-nemo-fast-conformer-ctc-be-de-en-es-fr-hr-it-pl-ru-uk-20k.tar.bz2 sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24.tar.bz2 sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24.tar.bz2

csukuangfj commented 3 days ago

Please see the updated table.

By the way, we don't convert very large models to sherpa-onnx, but it is feasible.

csukuangfj commented 3 days ago

Please search in the scripts folder for model conversion/export code.

dnhkng commented 3 days ago

Thanks for the clarification! I take it then for English, sherpa-onnx-nemo-parakeet_tdt_ctc_110m-en-36000.tar.bz2 is the best model?

Do you have any recommendations?