Closed jeremyf closed 1 year ago
What properties do derivatives get saved as? If they have their own, what would it look like to upload them and will OCR still work?
The IIIF Print does not presently have any logic regarding OCR of the files. Instead this is something called with-in other derivative services (see https://github.com/samvera/hyrax/blob/64c0bbf0dc0d3e1b49f040b50ea70d177cc9d8f6/app/services/hyrax/file_set_derivatives_service.rb#L123-L127)
If we want to run OCR on an intermediate file (e.g. one that is already a derivative), we will need to revisit how we're making the IIIF Print plugin.
What that would look like is to amend IiifPrint::PluggableDerivativeService
closed in sprint 2/20/2023
Summary
For conditional derivative generation, I think the best approach will be to:
module
that weprepend
to each of the plugins registered inHyrax::DerivativeService.services
#valid?
in the above module toreturn false if file_set.rdf_type&.join&.downcase&.include?("intermediate_file")
In the above implementation we’ll continue performing some of the derivative logic of the job. To remediate not performing some of that work would require further adjustments.
Discussion and Notes
Samvera Gem Versions for UTK:
The
Hyrax::DerivativeService
defines the interface for derivative services (and is itself a viable, albeit abstract derivative service).The key method is
Hyrax::DerivativeService.for
; that is used to find the first valid service, with the fallback being an instance of that class.The IIIF Print gem builds on the above by further configuring the
Hyrax::DerivativeService.services
class_attribute as follows:Which means the
IiifPrint::PluggableDerivativeService
is the firstservice
we check followed by the fallback service.Further Discussion
The Hyrax::CreateDerivativesJob#perform method (see below) leverages the
create_derivatives
functionality of theHyrax::DerivativeService
(viafile_set.create_derivatives
).Ideally, we would love to configure the application not to spawn the job if we don’t have a
#valid?
concrete derivative service for the given file_set (see the Hyrax::FileSet::Derivatives module). However, there are a few different ways that we invoke a derivative job; which means we likely need to adjust the#perform
method instead.Further complicating this is that the fallback derivative service (e.g.
Hyrax::DerivativeService.new
) is always valid. In other words, as implemented, everyfile_set
has a “valid” derivative service; it just so happens that the fallback does nothing.We’ll also want to consider how to change the custom override for the Hyrax::CreateDerivativesJobDecorator#perform.
In the above implementation, the “ensure a fresh copy” would be wonderful to have as a block for
file_set.create_derivatives(filepath)
; however most implementations of the derivative work does not accept a block.In the above implementation, we’d only call the block for non-null derivative functions.
A final consideration is that we have ValkyrieCreateDerivativesJob#perform to consider. (See below)