Subtitle generation with whisper fails if audio contains no spoken words

opencast / opencast

The free and open source solution for automated video capture and distribution at scale.

Educational Community License v2.0

400 stars 235 forks source link

Describe the bug When I upload a video with an audio track that contains no spoken words, and I run a workflow to generate subtitles using Whisper-ctranslate2, the speech-to-text workflow step fails with the error message "Whisper produced no output".

To Reproduce Steps to reproduce the behavior:

Upload video with audio track that contains no spoken words
Start workflow to generate subtitles
WOH speecht-to-text starts whisper with the option --vad_filter True
Workflow fails with error "Whisper produced no output"

Expected behavior From the user's point of view, this case should not generate an error because whisper worked as expected. If there are no spoken words in the audio, no subtitles should be generated. In this case, the workflow should not fail, but a warning should be reported.

Server environment:

Version 15
Ubuntu 22.04
whisper-ctranslate2 0.4.7

opencast / opencast

Subtitle generation with whisper fails if audio contains no spoken words #6145