Open MaleicAcid opened 1 week ago
There is an existing PR for Faster-Whisper to implement early stopping for non-voice audio, which can be found at https://github.com/SYSTRAN/faster-whisper/pull/1014. Until it's merged, there seems to be no straightforward solution to stop it early without adding an extra VAD, which is computationally intensive and unnecessary for most of users.
As a workaround, you could try implementing voice detection using the pyannote on your local machine before sending the audio to openlrc.
use openlrc version: 1.5.2
When try to transcribe a video that have no human voice, will get exception
RuntimeError: stack expects a non-empty TensorList
. I found the following text in log:Is it possible for openlrc to handle this situation and end the transcription task early? Generating an empty subtitle file and return its path as usual, which may be a reasonable way to deal with it.