scientist-softserv / adventist_knapsack

Apache License 2.0
2 stars 0 forks source link

🦄 Spike: Given IIIF Serverless Implementation, Determine Best Image Derivative Format #404

Closed jeremyf closed 1 year ago

jeremyf commented 1 year ago

The goal of this 🦄 (Research Spike) is to determine the appropriate image format for the pages we extract from a PDF.

Previously we had recommended PNG or JPG for storage constraints. However in conversations with the team that may not be feasible and/or advisable due to technical constraints.

kirkkwang commented 1 year ago

Findings

PTIFFs seem to only render when the user clicks and drags or resizes the image. PNG, GIF, JP2, JPEG, and regular TIF seem to all render fine.

We used the following as our test images: image_formats.zip

We used the British Library staging as it is currently on a serverless configuration.

Details

Example work from British Library Staging

For speed Pyramidal TIFF (PTIFF) and JPEG 2000 files types are recommended. However, in our IIIF Serverless implementation PTIFFs don't seem to work smoothly.

In this example, six of the same image (but in different file types) were added to be viewed in UV. All of them load fine except for the PTIFF. If you click on the PTIFF you'll be met with the loading icon until you click and drag the image or zoom in/out.

Console logs

image

Lambda link: https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/5e29160ac1cf7938f1d30b903312655fd7606f92/512,0,88,398/88,/0/default.jpg

Error message: Error: extract_area: bad extract area

Clicking around the images (avoiding the PTIFF) we are observing these messages in the console in a serverless configuration.

image
Lambda links from the screenshot - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/8523dad0b9efd398ee4571b443c69f2f98600a77/full/300,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/0828a40cb5b7335cc7d79bdb40348e8d30527d94/0,0,512,398/512,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/14662cccabd3ca2613715094571a9d251cb4a3dc/full/300,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/8523dad0b9efd398ee4571b443c69f2f98600a77/0,0,512,398/512,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/14662cccabd3ca2613715094571a9d251cb4a3dc/0,0,512,398/512,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/898ee6a783aa3f64367bd86e195ab3f8ed8acfbb/0,0,512,398/512,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/14662cccabd3ca2613715094571a9d251cb4a3dc/full/300,/0/default.jpg - https://jwvkwtotuzhmc4olezwhmwtqym0yrftj.lambda-url.eu-west-1.on.aws/iiif/2/14662cccabd3ca2613715094571a9d251cb4a3dc/0,0,512,398/512,/0/default.jpg

PTIFF Creation

The PTIFF was created with ImageMagick using the follow command:

convert cat.tif -define tiff:tile-geometry=256x256 -compress jpeg 'ptif:cat_ptif.tif'

On MacOS using Preview, the PTIFF looks like this:

adl-379

Non-serverless

The same behavior is not seen on Adventist which is not IIIF Serverless configured.

Example of work on Adventist Staging

References

jeremyf commented 1 year ago

@kirkkwang what is the recommended image format we should use for images split off of a PDF? My read is JPG, PNG, and TIF all work fine. Is that correct? And we can proceed with making recommendations based on storage considerations.

kirkkwang commented 1 year ago

JP2 preferred according to IIIF community but since we don't have that set up, I recommend JPG, or get the Lambda to play nice with PTIFF

ShanaLMoore commented 1 year ago

Per Jeremy, we will want @orangewolf to review this before closing.

orangewolf commented 1 year ago

Ptiff seems to be the obvious answer. I know that’s what Michael Klein and his team use at North Western (originators of iiif print). It’s also what Yale is using and their product lead is very involved in the IIIF spec. The Yale take was that though jp2 had been the go to, ptiff was actually better for performance and was becoming the default over time. Can we timebox the ptiff not loading thing to 1 dev day and see if it’s fixable? Specifically I’d start by posting in the samvera channels, tagging M. Klein and seeing if anyone else has had the issue before doing any dev on it?

cc @jeremyf