Closed getpaoapps closed 6 days ago
Interesting. It seems the transcribing actually finished, but there was nothing to transcribe.
Listening to your audio it seems it's singing + music rather than clear speech. I am guessing the VAD detects no actual words spoken, or Whisper can't recognize any speech, therefore nothing gets transcribed, causing this error (out of range
error happens because the it tries to get data of the first segment, which does not exist).
But I'm unsure if this is exactly the case.
Do you have more similar sounding audio files where it does and does not work?
Here is another example. Do you think it makes sense to add check on segments count and return empty output?
yes good idea! if there are no segments in the output, then it should just return an empty array, not produce an error
I notice infrequent failures for non-english audios with the error:
Error Running inference with local model', IndexError('list index out of range'
.Example of failing audio.
Error log: