File specific language specification

speechToTextWF currently sends jobs to the speech-to-text service for an SDR Item, which may include multiple media files, and a set of options to use for all of them:

{
  "id": "gy983cn1444-v2",
  "media": [
    "snl_tomlin_phone_company_en.mp4",
    "snl_tomlin_phone_company_es.mp4"
  ],
  "options": {
    "language": "en"
  }
}

Whisper output can vary depending on the language option. Furthermore an SDR Item can have files with more than one languages. So we want users to be able to specify what language a specific file is transcribed in.

One suggested way of communicating that would be to turn the list of strings in media into a list of objects, which have an options property that allows you to override the options for the job as a whole.

In this example the job includes two files and they are processed using English and Spanish:

{
  "id": "gy983cn1444-v2",
  "media": [
    {
      "name": "snl_tomlin_phone_company_en.mp4",
      "options": {
        "language": "en"
      }
    },
    {
      "name": "snl_tomlin_phone_company_es.mp4",
      "options": {
        "language": "es"
      }
    }
  ]
}

sul-dlss / speech-to-text

File specific language specification #51