sul-dlss / speech-to-text

Tools for generating transcript and caption files from media files (e.g. a Docker container for running Whisper on video files in AWS ECS? 🤷🏽)
0 stars 0 forks source link

should we automatically update the model files that whisper uses? if so, at what frequency and with what mechanism? #23

Open jmartin-sul opened 2 months ago

jmartin-sul commented 2 months ago

this list has the URLs for retrieving models as setup for building the container: https://github.com/sul-dlss/speech-to-text/blob/main/whisper_models/urls.txt

see also https://github.com/sul-dlss/speech-to-text?tab=readme-ov-file#build

jmartin-sul commented 1 month ago

possible storytime fodder

edsu commented 1 month ago

Perhaps we could add a unit test that compares the list of models:

https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L32

with the ones in https://github.com/sul-dlss/speech-to-text/blob/main/whisper_models/urls.txt

Then when we update whisper, and there is a new model, the test will start to fail? We will need to remember that fixing the test requires rebuilding the Docker container...