Open t41372 opened 4 months ago
Which model will you use in the backend to transcribe it? Does the word error rate increase in that way?
bumping as this is exactly what I need. I already have instances of Whisper that are available for transcription / translation on the back end - but reducing latency in a response means getting chunks transcribed as they appear. I suspect a reasonable "chunk" is one sentence.
Is there a way to get the audio data while the speech is active before it ends? I want to get the audio data when the speech starts, stream it to the back end in real-time and stop streaming when it ends. It seems like the
onFrameProcessed
callback only has a probability property. Thanks