sul-dlss / speech-to-text

Tools for generating transcript and caption files from media files (e.g. a Docker container for running Whisper on video files in AWS ECS? 🤷🏽)
0 stars 0 forks source link

Investigate Whisper.writer parameters #35

Open alundgard opened 1 month ago

alundgard commented 1 month ago

Run different parameter combinations on a media test item and observe the vtt output formatting, attending to subtitle accessibility guidelines and readability.

Whisper.writer parameters

Subtitle formatting parameters are input to the writer (obtained by whisper.utils.get_writer), and not the model (model.transcribe). NB: To use these writer parameters, word_timestamps must be set to True as input to model.transcribe.

Preliminary parameter testing and vtt output: Pre-pilot parameter testing (Local).

Questions

Relevant links

alundgard commented 4 days ago

This Whisper output contains caption segments that are too long. Although they appear on the screen long enough to read them, accessibility guidelines recommend a max of 42 chars per line, and a max of 2 lines per segment.

https://sul-purl-stage.stanford.edu/jr745nr3367 Image