I've wrote this week audio.vadwebrtc as I needed a quick way to remove from audio files segments without voice as I need to transcribe audio files with audio.whisper and that model hallucinates on audio segments containing only silences.
Would it be possible technically to allow multiple start/total_times so that these are all combined in 1 file? So that I can write something like this: av_audio_convert(file, output = "test.wav", start_time = voiced$start, total_time =voiced$end - voiced$start), generating 1 output file?
I've wrote this week audio.vadwebrtc as I needed a quick way to remove from audio files segments without voice as I need to transcribe audio files with audio.whisper and that model hallucinates on audio segments containing only silences.
I looked at this chunk of code in package av: https://github.com/ropensci/av/blob/58d702683261d23fa7620a42aabfe776705b50a7/src/video.c#L652-L670 and it only handles one start_time / total_time. My code looks like this to extract from an audio file only the part containing voice.
Would it be possible technically to allow multiple start/total_times so that these are all combined in 1 file? So that I can write something like this:
av_audio_convert(file, output = "test.wav", start_time = voiced$start, total_time =voiced$end - voiced$start)
, generating 1 output file?