sul-dlss / speech-to-text

Tools for generating transcript and caption files from media files (e.g. a Docker container for running Whisper on video files in AWS ECS? 🤷🏽)
0 stars 0 forks source link

Finish skeleton common-accessioning robot and workflow def for... `captionWF`? `speechToTextWF`? [final name TBD] #8

Closed jmartin-sul closed 2 months ago

jmartin-sul commented 2 months ago

The start of the code for this is already there, and assumes the name will be captionWF: https://github.com/sul-dlss/common-accessioning/tree/main/lib/robots/dor_repo/caption

But per standup and Slack discussion, we'll decide on the workflow name and the terminology for audio/video text extraction in 2024-09-13 post-standup discussion. Discussion seems to be leaning towards "caption" or "speechToText" as the term to use in general.

The workflow XML is not yet in place, but the skeleton code is already present. The latter might need a rename, pending the above decision. I'll mark this ticket blocked on that decision.

The workflow definition would be a new XML file here: https://github.com/sul-dlss/workflow-server-rails/tree/main/config/workflows

The skeleton workflow with placeholders for steps we expect, with implementations filled in as supporting services are developed, will be similar to what we did for ocrWF.

Steps we're likely to have, in order:

Currently, there is placeholder workflow step code for starting the workflow, ending it, and generating captions (this last part will very likely be broken into the multiple steps describe above).

jmartin-sul commented 2 months ago

closing in favor of an issue in the correct repo: https://github.com/sul-dlss/common-accessioning/issues/1341