Open KhoaNgo18 opened 5 months ago
You can either see secondType
to 2
or 3
.
Remember to place the corresponding files to assets
.
You can find pre-built ASR APKs with Whisper at https://github.com/k2-fsa/sherpa-onnx/releases/tag/v1.9.14
Similarly, for iOS, please see
You need to place the corresponding model files in your project.
Can I only use Whisper since when I test with 2Pass, it can only detect English. Ask I understand in the 2Pass code, I have to have 2 models, sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 and sherpa-onnx-whisper-base.en. I cannot use only Whisper.
whisper is a non-streaming ASR model, you cannot use it for real-time streaming ASR.
We don't provide an APK or an example to use Whisepr alone for non-streaming ASR in Android/iOS, but we do provide APIs.
So the answer is yes; you can use Whisper alone in Android/iOS.
can you guide me on how to use the APIs or at least where's the APIs at. I'm new to mobile and AI, so I appreciate your help a lot
You can find all the required APIs in our two-pass example, which I have already posted in the first comment.
If you are new to Android and iOS and are also new to Kotlin and Swift, then it may be difficult for you.
I was able to get the whisper small multilingual to work with this 2Pass code for Romanian language:
func getNonStreamingWhisperSmall() -> SherpaOnnxOfflineModelConfig {
let encoder = getResource("small-encoder.int8", "onnx")
let decoder = getResource("small-decoder.int8", "onnx")
let tokens = getResource("small-tokens", "txt")
return sherpaOnnxOfflineModelConfig(
tokens: tokens,
whisper: sherpaOnnxOfflineWhisperModelConfig(
encoder: encoder,
decoder: decoder,
language: "ro"
),
numThreads: 1,
modelType: "whisper"
)
}
Then in the SherpaOnnxViewModel:initOfflineRecognizer()
change
let modelConfig = getNonStreamingWhisperTinyEn()
to
let modelConfig = getNonStreamingWhisperSmall()
I think it would make more sense for my use case to use VAD instead (as in SherpaOnnxVadAsr
). I will try that next.
@csukuangfj how hard is it to make the code changes for the whisper model to take the language param at runtime, not while the model is loaded? Could you please point me to the general code area?
Thank you!
@csukuangfj I noticed that if I just set the language to en
, whisper will switch from transcribe to translate mode.
@csukuangfj I think this is what I am looking for if I want to pass the language to decoder at runtime:
how hard is it to make the code changes for the whisper model to take the language param at runtime
I'm sorry; unfortunately, we don't provide an API for users to do that.
@csukuangfj my bad, I misread the code. Whisper multilingual model config with an empty language is working perfectly. I tested it with VAD in iOS SherpaOnnx2Pass
. After init VAD, just doing something similar to Android's version:
let array = convertedBuffer.array()
if !array.isEmpty {
self.vad.acceptWaveform(samples: [Float](array))
while !self.vad.isEmpty() {
let s = self.vad.front()
self.vad.pop()
let lastSentence = self.offlineRecognizer.decode(samples: s.samples).text
self.sentences.append(lastSentence)
self.updateLabel()
}
}
I wanted to use Whisper model for the STT but as I look into the code written for android and ios, I can't find the needed function to init the Whisper Model. I already see that the Whisper is supported for OfflineModel, btw I don't understand the concept of 2Pass, it would be great to get to know it better.