opencast / opencast

The free and open source solution for automated video capture and distribution at scale.
https://opencast.org
Educational Community License v2.0
400 stars 235 forks source link

Subtitle generation with whisper fails if audio contains no spoken words #6145

Open snoesberger opened 3 months ago

snoesberger commented 3 months ago

Describe the bug When I upload a video with an audio track that contains no spoken words, and I run a workflow to generate subtitles using Whisper-ctranslate2, the speech-to-text workflow step fails with the error message "Whisper produced no output".

To Reproduce Steps to reproduce the behavior:

  1. Upload video with audio track that contains no spoken words
  2. Start workflow to generate subtitles
  3. WOH speecht-to-text starts whisper with the option --vad_filter True
  4. Workflow fails with error "Whisper produced no output"

Expected behavior From the user's point of view, this case should not generate an error because whisper worked as expected. If there are no spoken words in the audio, no subtitles should be generated. In this case, the workflow should not fail, but a warning should be reported.

Server environment:

snoesberger commented 1 month ago

This is not really an Opencast problem, it comes from whisper-ctranslate2. In any case, a subtitle file should be generated even though there is no recognisable speech in the audio. This issue has already been addressed in the faster-whisper repository (which is a dependency of whisper-ctranslate2): https://github.com/SYSTRAN/faster-whisper/pull/895. So far there is no new version of faster-whisper that includes this PR. However, as the change is quite simple, it can easily be applied manually.