k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.41k stars 407 forks source link

Whisper Language ID: Output all languages and their probabilities #1475

Open marchellodev opened 2 hours ago

marchellodev commented 2 hours ago

Currently, the language ID based whisper outputs only one language with the top probability:

fn main() {
    let file_path = std::env::args().nth(1).expect("Missing file path argument");

    let (samples, sample_rate) = sherpa_rs::read_audio_file(&file_path).unwrap();
    assert_eq!(sample_rate, 16000, "The sample rate must be 16000.");

    let config = sherpa_rs::language_id::SpokenLanguageIdConfig {
        encoder: "sherpa-onnx-whisper-tiny/tiny-encoder.onnx".into(),
        decoder: "sherpa-onnx-whisper-tiny/tiny-decoder.onnx".into(),
        ..Default::default()
    };
    let mut extractor = sherpa_rs::language_id::SpokenLanguageId::new(config);

    let language = extractor.compute(samples, sample_rate).unwrap();
    println!("Spoken language: {}", language);
}

(from @sherpa-rs)

It would be great if sherpa could output the full result -- all languages and associated accuracies.

Faster-whisper, for example, can do this: https://github.com/SYSTRAN/faster-whisper/blob/c2a1da1bd94e002c38487c91c2f6b50a048000cf/faster_whisper/transcribe.py#L1764

csukuangfj commented 2 hours ago

Would you like to contribute?