sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

auto detect language from json file #1423

Closed peetucket closed 6 days ago

peetucket commented 1 week ago

Why was this change made? 🤔

Fixes https://github.com/sul-dlss/speech-to-text/issues/45

Grab the json file that is produced by whisper and use it to update cocina with the languageTag attribute. Set to nil if it can't be parsed.

Note, this PR will also set the languageTag to nil for OCR files (it was simply being left off before).

Example media object with languageTag of en as set by Whiser: https://argo-qa.stanford.edu/view/druid:pz039vb9760 Example OCR object with languageTag not set: https://argo-qa.stanford.edu/view/druid:qf961vf0644

How was this change tested? 🤨

New specs and integration tests