scientist-softserv / adventist_knapsack

Apache License 2.0
2 stars 0 forks source link

Update derivative generation to use derivative rodeo to skip *TN.jpg and `.READER.pdf` files #431

Open jeremyf opened 1 year ago

jeremyf commented 1 year ago

In the <2023-03-14 Tue> conversation with Katharine, we have the following situation:

We can and should skip derivative generation for PD for those secondary PDFs. All secondary PDFs have the suffix .READER.pdf (make sure to test in a case insensitve manner). Example: =32000812.READER.pdf=

For a reference implementation (albeit with different rules):

Related to:

We also do not want to create derivatives for TN.jpg files.

Testing Instructions

KatharineV commented 1 year ago

Team, can we exclude PDFs with .READER.pdf and also .pdf-r.pdf? I recently found a big batch of material our digitization center processed with the .pdf-r.pdf file naming convention at some point in the past. We'd like to exclude these files from the viewer, as they are Reader files (but just didn't receive the correct name).

jeremyf commented 1 year ago

Absolutely going to add this bit of logic.

jeremyf commented 1 year ago

I want to de-prioritize this as the derivative work that I’m doing this week should resolve/supercede the changes that I’ve made to attempt to address this ticket.

(The importer process I'm working through will allow for significant improvements but is a complete re-architecture of the approach)

Duplicated/replaced by:

laritakr commented 1 year ago

We also do not want to create derivatives for TN.jpg files.

jillpe commented 1 year ago

dependent on derivative rodeo work