Closed solaoi closed 1 month ago
Please have a look at https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/zipformer-transducer-models.html#sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01-japanese
I think sherpa-rs should have already supported it.
@csukuangfj Hi, thank you for your quick response.
I tried to modify the examples/transcribe.rs
to use the reazonspeech-k2-v2 model, but I encountered an error during execution.
transcribe.rs
example with the following code:use eyre::{bail, Result};
use sherpa_rs::{read_audio_file, transcribe::whisper::WhisperRecognizer};
use std::time::Instant;
fn main() -> Result<()> {
let path = std::env::args().nth(1).expect("Missing file path argument");
let provider = std::env::args().nth(2).unwrap_or("cpu".into());
let (sample_rate, samples) = read_audio_file(&path)?;
// Check if the sample rate is 16000
if sample_rate != 16000 {
bail!("The sample rate must be 16000.");
}
let mut recognizer = WhisperRecognizer::new(
"reazonspeech-k2-v2/decoder-epoch-99-avg-1.onnx".into(),
"reazonspeech-k2-v2/encoder-epoch-99-avg-1.onnx".into(),
"reazonspeech-k2-v2/tokens.txt".into(),
"ja".into(),
Some(true),
Some(provider),
None,
None,
);
let start_t = Instant::now();
let result = recognizer.transcribe(sample_rate, samples);
println!("{:?}", result);
println!("Time taken for transcription: {:?}", start_t.elapsed());
Ok(())
}
cargo run --example transcribe speech-001.wav
speech-001.wav is here.
/Users/solaoi/Projects/solaoi/sherpa-rs/target/debug/build/sherpa-rs-sys-e83e885fd8f7116f/out/sherpa-onnx/sherpa-onnx/c-api/c-api.cc:convertConfig:434 OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=512, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="reazonspeech-k2-v2/encoder-epoch-99-avg-1.onnx", decoder="reazonspeech-k2-v2/decoder-epoch-99-avg-1.onnx", language="ja", task="transcribe", tail_paddings=-1), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="", use_itn=False), telespeech_ctc="", tokens="reazonspeech-k2-v2/tokens.txt", num_threads=2, debug=True, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
/Users/solaoi/Projects/solaoi/sherpa-rs/target/debug/build/sherpa-rs-sys-e83e885fd8f7116f/out/sherpa-onnx/sherpa-onnx/csrc/offline-whisper-model.cc:InitEncoder:243 ---encoder---
model_author=k2-fsa
model_type=zipformer2
version=1
comment=non-streaming zipformer2
/Users/solaoi/Projects/solaoi/sherpa-rs/target/debug/build/sherpa-rs-sys-e83e885fd8f7116f/out/sherpa-onnx/sherpa-onnx/csrc/offline-whisper-model.cc:InitEncoder:247 n_mels does not exist in the metadata
Hey, Currently sherpa-rs has only support for whisper model which is multilingual.
@csukuangfj Does sherpa-onnx already support that Japanse model? If there's some example, I can add it to sherpa-rs as well.
yes, it is just an offline transducer model
there is c api for it.
https://github.com/k2-fsa/sherpa-onnx/blob/master/c-api-examples%2Fzipformer-c-api.c
here is the c api example
@solaoi
Add to latest version. See examples/zipformer.rs
@thewh1teagle Thank you for the update! It's working perfectly now.
Request for Adding ReazonSpeech's reazonspeech-k2-v2 Model
Hi, first of all, thank you for your excellent work on sherpa-rs! I would like to request the addition of support for the reazonspeech-k2-v2 model from ReazonSpeech in this project.
Model Information:
This model is built for automatic speech recognition (ASR) and has been fine-tuned for Japanese speech. Integrating this model into sherpa-rs would be incredibly helpful for expanding its ASR capabilities, especially for handling Japanese language tasks more effectively.
Reasons for Addition:
Model Integration:
If possible, I would greatly appreciate any guidance on how I could assist with this integration, or if it's something that can be considered in a future release.
Thank you again for your time and efforts on this project!