internetarchive / iiif

The official Internet Archive IIIF service
GNU General Public License v3.0
21 stars 4 forks source link

Guide for Internet Archive IIIF endpoint #20

Closed sammeltassen closed 3 months ago

sammeltassen commented 11 months ago

For the documentation, would it be possible to specify which upload formats would work best with the current Cantaloupe integration? And to provide some best practices for pre-processing images?

For example, I presume that an image which is already compressed to jp2 before upload might significantly speed up the IIIF integration, compared to the original tif.

Related question: in the presentation it was mentioned that the current workflow uses IA's download link. How does it determine which file format to use, if multiple originals are available? Does it, for example, prefer jp2 over tif?

glenrobson commented 11 months ago

We agree this would be good to include in the documentation. Glen believes it does prioritise jp2s over tiffs but will have to check.

We can borrow some instructions from:

https://blog.archive.org/2012/05/24/uploading-images-for-text-items/

and

https://help.archive.org/help/how-to-upload-scanned-images-to-make-a-book/

glenrobson commented 11 months ago

Note we found in the htj2k testing that ptiff was the fastest format so it would be interesting to see if this works with the IA setup:

https://journal.code4lib.org/articles/17596

benwbrum commented 7 months ago

https://docs.google.com/presentation/d/1EnW99P6J8yiDyA6uDmXSBrOk_MtDFzd9MUS8hfso-cA/edit#slide=id.g2a0d03839fe_0_156 mentions the format needed for "book" style items in IA.

glenrobson commented 6 months ago

Glen to provide Sara with some pTiffs to test with.

glenrobson commented 5 months ago

Note there is also a request on twitter: https://x.com/Mehrandhn/status/1764412753383456972?s=20

saracarl commented 4 months ago

Based on this:

the *_images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .jpeg, .tif, .tiff, .bmp or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored.

and this:

  1. Use only jpg, jpeg, jp2, tif, tiff, png, gif or bmp files. Any combination of them is acceptable.

ptiffs aren't an option for upload.

saracarl commented 4 months ago

Whether to upload TIFFs or JP2s is dependent. Here's a ChatGPT generated summary of the trade-offs:

Performance Considerations:

Network Bandwidth vs. Server Processing: If network bandwidth is limited, JPEG 2000 may be preferable due to its smaller file sizes. However, if server processing power is limited, TIFF may be more efficient due to simpler decompression requirements. Server Hardware: Some servers might have specific optimizations or hardware accelerators for JPEG 2000, which can greatly enhance its performance. Client Requirements: If clients frequently request high-resolution partial images, the region decoding efficiency of JPEG 2000 might offer better performance.

glenrobson commented 4 months ago

Would be great to put this information in this file:

https://github.com/IIIF/guides/blob/main/guides/archive.org/index.md

Note discussion on slack also:

https://iiif.slack.com/archives/C06HUFT147P/p1714053128531299

glenrobson commented 4 months ago

Can you upload a ptiff with a tiff extension.

saracarl commented 3 months ago

https://github.com/IIIF/guides/pull/59

saracarl commented 3 months ago

So I uploaded a ptiff to IA (as a .tif) and it seems to "just work" for the IIIF endpoint, which is great. Here it is in Mirador: https://projectmirador.org/embed/?iiif-content=https://iiif.archive.org/iiif/3/tx-burnet-123835-1909-125000-geo/manifest.json However the item details page in Internet Archive: https://archive.org/details/tx-burnet-123835-1909-125000-geo Can't/won't/doesn't show the image, even after the derivative process runs.

We then tried adding a jpg to the item in addition to the ptiff with a .tif suffix; in that case it showed the jpg in IA for the item but the manifest also referenced the jpg not the tif.

We don't current serve/support ptiff at all; just .tif files.

The tricky bit here is we want to privilege jpg over plain tifs, but pyramidal tifs over jpgs. So the ideal might be to upload both a .ptiff and a .jpg -- and our code should identify and choose the .ptiff.

Here's a test that includes both a jpg and a ptiff; we should privilege the ptiff over the jpeg in the IIIF manifest. https://archive.org/details/tx-burnet-123835-1909-125000-geo