k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.46k stars 409 forks source link

Transcribe wav files with timestamps #668

Open nanaghartey opened 7 months ago

nanaghartey commented 7 months ago

First, Great repo! I checked out the android ASR demo and it seems it can only transcribe through mic source. Can it transcribe wav files too? (Example a recorded interview or meeting) Most ASR frameworks support this feature (E.g open ai whisper api , vosk etc) . It will be very useful if Sherpa-onnx for mobile can accept wav files and output a transcript of the provided audio in text with timestamps at the segment, word level, or both. This will enable precision for wav file transcripts

csukuangfj commented 7 months ago

We support that using c++, python and other APIs, but not on Android and iOS.

Is there any reason to use Android to do that task? Where does the wave file come from?

nanaghartey commented 7 months ago

Can you implement it on android/ios, on-device? There are many use cases . Wave files come from various sources(third party apps, inbuilt etc) In my case, I'm doing video dubbing with Sherpa-onnx but because it does not accept wave files I get stuck at the transcription part.

It works with vosk-android but accuracy of vosk models are poor.

I upload an mp4 and convert to wav then I perform transcription on the wav file retrieving all segments with timestamps, translate and superimpose it back on the video.

Wav file transcriptions in other apps are used to perform audio analysis etc

Adding such feature will be very beneficial especially for small developers who can't afford server costs

csukuangfj commented 7 months ago

It works with vosk-android

Could you post the URL for vosk-android that can accept wave files?

nanaghartey commented 7 months ago

It works with vosk-android

Could you post the URL for vosk-android that can accept wave files?

Here you go https://github.com/alphacep/vosk-android-demo

Set rec.setWords(true); on the recognizer to get the start and end times:

 Recognizer rec = new Recognizer(model, 16000.f);
 rec.setWords(true);
nanaghartey commented 7 months ago

@csukuangfj any updates on this?