Closed bseeger closed 2 years ago
A new Action would be added as a plugin, like so: https://github.com/Islandora/islandora/blob/2.x/modules/islandora_text_extraction/src/Plugin/Action/GenerateOCRDerivative.php
Maybe the most logical place to put it in our setup is here: https://github.com/jhu-idc/idc_defaults/tree/main/src/Plugin in an Actions folder.
Bethany has the right overview- this is fairly straightforward to configure. Unfortunately we don't have any availability in the next six weeks or so to assist with this.
This not a bug, really, so much as it's outside of what the system is designed to do out of the box.
Right now derivative generation is focused on Original Files, so any other type of file may not get a derivative made.
For example, the oral history records have the mp3 as the Original and Service file along with a PDF transcript marked as Extracted Text. The system will not extract data from the PDF file automatically, as it is only setup to trigger that if the PDF were an original file. (there should only be one Original File per media set).
The way this could work is to create more Drupal Contexts and Actions to support this behavior. All the material is there, at least in some form, and with some tweaking and new wiring we could create a context that triggers on PDF files being added as Extracted Text to Audio/Video/Image Nodes. A new Action would be needed to pull in the right Media Use type to create the derivative out of (right now the Action pulls the Original File).