sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

`speechToTextWF` `update-cocina`: update cocina structural to reference files generated by speech_to_text_generation_service #1361

Open jmartin-sul opened 2 months ago

jmartin-sul commented 2 months ago

for now, blocked by the implementation of preceding steps in speechToTextWF, but we could probably parallelize this with work on the preceding steps, if we wanted. we might be able to work off of the ocrWF equivalent as an example?

this is the speechToTextWF equivalent of ocrWF update-cocina

stub code in the speechToTextWF: https://github.com/sul-dlss/common-accessioning/blob/main/lib/robots/dor_repo/speech_to_text/update_cocina.rb

part of https://github.com/sul-dlss/common-accessioning/issues/1363

peetucket commented 1 month ago

Note: at the moment, this step is implemented using the exact same algorithm used by the ocr update-cocina step, which means it picks up new files in the workspace generated by Whisper, finds resources with base filenames that match, and then adds them to the resource.

We still need to consider:

  1. which files get roles (e.g. transcription)
  2. if all files should be added or only of a certain mimetype
jmartin-sul commented 2 weeks ago

the developer who picks this up should consult with andrew on whether the code we have is doing the right thing, or if it needs tweaking

peetucket commented 2 weeks ago

Roles:

".vtt" = "caption" ".txt" = transcription

Publish/preserve/shelve = true for vtt and txt files

jmartin-sul commented 1 week ago

leaving open till we confirm that the captions generated by speechToTextWF actually work in sul-embed when watching a video

jmartin-sul commented 1 week ago

captions confirmed to display correctly

peetucket commented 1 week ago

Here is one to check: https://sul-purl-stage.stanford.edu/bh691ds2057

I see a transcript panel in sul-embed