When we have a PDF Asset with work_source_pdf role, we want to make a pdf derivative, with a smaller file size and scaled down images. Some of our "born digital" Distillations PDFs have 300 dpi images for printing, and are quite big.
We define a derivative on AssetUploader, in the standard way -- it will be stored in the derivatives hash in the shrine file!
we mark it default_create: false, so it won't be automatically created on ingest, it will be triggered manually. (on ingest, role isn't set yet telling us this is a work_source_pdf we want to create the deriv for!)
The setup_work_from_pdf_source action -- which was already extracting page images as Assets -- will now also see if the derivative is already on the work_source_pdf, and if not launch a CreateScaledDownPdfDerivativeJob to create one. (It can take 120+ seconds to create)
The actual work of creating the PDF is done by ScaleDownPdf class -- which shells out to gs (ghostscript) command line
Then the recently added WorkDownloadOptionsCreator class was enhanced to notice the presence of the derivative (called scaled_down_pdf, with key kept in a constant), and add it to the whole-work options shown on Work page and in Downloads popup menu.
On top of #2747
When we have a PDF Asset with
work_source_pdf
role, we want to make a pdf derivative, with a smaller file size and scaled down images. Some of our "born digital" Distillations PDFs have 300 dpi images for printing, and are quite big.We define a derivative on
AssetUploader
, in the standard way -- it will be stored in the derivatives hash in the shrine file!default_create: false
, so it won't be automatically created on ingest, it will be triggered manually. (on ingest, role isn't set yet telling us this is awork_source_pdf
we want to create the deriv for!)setup_work_from_pdf_source
action -- which was already extracting page images as Assets -- will now also see if the derivative is already on thework_source_pdf
, and if not launch aCreateScaledDownPdfDerivativeJob
to create one. (It can take 120+ seconds to create)The actual work of creating the PDF is done by
ScaleDownPdf
class -- which shells out togs
(ghostscript) command lineGhost script has a "pseudo-device" pre-set for
ebook
, that downsamples images to 150dpi, among other things. We're using that. There was ascreen
preset for 72dpi, but that made images look noticeably bad. 150 dpi was fine on devices I tested, still with significant file size reduction for our sample files. https://ghostscript.readthedocs.io/en/gs10.0.0/VectorDevices.html#the-family-of-pdf-and-postscript-output-devicesAdded check for
gs
to system_env_specgs
was already present on my Mac (prob a brew dependency of vips?) and on herokuBut it was removed from heroku-24. Could be re-added with a custom buildpack, but it also looks like they may be re-adding it back. https://www.reddit.com/r/Heroku/comments/1fj3cr4/ghostscript_on_heroku24/.
Then the recently added
WorkDownloadOptionsCreator
class was enhanced to notice the presence of the derivative (calledscaled_down_pdf
, with key kept in a constant), and add it to the whole-work options shown on Work page and in Downloads popup menu.