yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

GeneratePDFJob - Unsupported Image Type #2470

Open sshetenhelm opened 1 year ago

sshetenhelm commented 1 year ago

EDIT -- Split off to #2597 to work on PTIFFs, which may potentially fix this issue.

Story 32 migrated Garvin objects are not generating PDFs and return the following error information:

PDF Java app returned non zero response code for 16201231: java.io.IOException: Error reading image after convert for (https://yul-dc-prod-images.s3.amazonaws.com/ptiffs/98/16/37/63/98/16376398.tif?response-content-disposition=attachment&X-Amz-Algorithm=redacted&X-Amz-Credential=redacted&X-Amz-Date=redacted&X-Amz-Expires=redacted&X-Amz-SignedHeaders=redacted&X-Amz-Signature=redacted) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.getBufferedImage(JpegPdfConcatImpl.java:462) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.drawImageOnPage(JpegPdfConcatImpl.java:391) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.addJpegPageToDocument(JpegPdfConcatImpl.java:383) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.addJpegPages(JpegPdfConcatImpl.java:90) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.generatePdf(JpegPdfConcatImpl.java:78) at edu.yale.library.jpegs2pdf.processor.JsonToPdfProcessorImpl.generatePdf(JsonToPdfProcessorImpl.java:64) at edu.yale.library.jpegs2pdf.App.run(App.java:37) at edu.yale.library.jpegs2pdf.App.main(App.java:16) Caused by: javax.imageio.IIOException: Unsupported Image Type at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1176) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1145) at java.desktop/javax.imageio.ImageIO.read(ImageIO.java:1468) at java.desktop/javax.imageio.ImageIO.read(ImageIO.java:1363) at edu.yale.library.jpegs2pdf.JpegPdfConcatImpl.getBufferedImage(JpegPdfConcatImpl.java:460) ... 7 more

Affected OIDs include: 16191266 16196483 16197557 16197670 16198706 16198710 16198730 16198830 16199001 16199002 16199010 16200443 16201231 16203507 16203930 16204239 16204546 16204646 16207736 16208165 16215293 16215613 16215629 16215780 16215781 16215782 16215784 16215792 16218078 16309963 16309964 16532465

Not sure if this is an error with the images themselves, how they were ingested, etc.

Acceptance

K8Sewell commented 1 year ago

https://collections.library.yale.edu/catalog/16197670 - example object

sshetenhelm commented 1 year ago

More examples:

UAT - https://collections-uat.library.yale.edu/management/parent_objects/16532465 https://collections-uat.library.yale.edu/management/parent_objects/16191266 https://collections-uat.library.yale.edu/management/parent_objects/16199002

K8Sewell commented 1 year ago

Findings: color coding was CMYK for tif image (child object: 16197670) for parent: 16197670. No obvious issues with image in S3.

MaggieZhaoYale commented 1 year ago

Tried identify 16376398.tif and convert the image to pdf. Both command got the following messages: Incompatible type for "RichTIFFIPTC"; tag ignored. `TIFFFetchNormalTag' @ warning/tiff.c/TIFFWarnings/898.

delegate library support not built-in 'yul_dc_store_9/98/16/37/63/98/16376398.tif' (XML) @ warning/profile.c/ValidateXMPProfile/1690.

mikeapp commented 1 year ago

Does this file still exist in the pair tree? I can't download it. https://collections-uat.library.yale.edu/management/child_objects/16539364 ...it is a child of... https://collections-uat.library.yale.edu/management/parent_objects/16532465

sshetenhelm commented 1 year ago

I was able to download it.

mikeapp commented 1 year ago

The PDF processor uses the PTIFF. Testing locally, for a problem image I see:

convert -resize 2000x2000 "900099834.tif[0]" test.jpg
convert: Can not read scanlines from a tiled image. `900099834.tif' @ error/tiff.c/TIFFErrors/599.

For a normal object I run this and see no errors, the jpg is produced:

convert -resize 2000x2000 "1030368.tif[0]" test.jpg
mikeapp commented 1 year ago

Per @K8Sewell above, the color space of the original and the PTIFF are CMYK. But it seems like https://github.com/yalelibrary/yul-dc-management/blob/ad5d34796bebac8be0e1293d08b4b39da920a10b/app/lib/tiff_to_pyramid.bash#L87 should have converted it to sRGB during PTIFF generation?

Original:

bash-3.2$ exiftool 16539363.tif | grep Color
Color Space Data                : CMYK
Device Attributes               : Reflective, Glossy, Positive, Color
Rendering Intent                : Media-Relative Colorimetric
Colorant Count                  : 4
Colorant 1 Name                 : Cyan
Colorant 1 Coordinates          : 36984 23351 19207
Colorant 2 Name                 : Magenta
Colorant 2 Coordinates          : 34083 52626 31991
Colorant 3 Name                 : Yellow
Colorant 3 Coordinates          : 61967 31782 58655

PTIFF:

bash-3.2$ exiftool 16539363-2.tif | grep Color
Color Mode                      : CMYK
Color Space                     : Uncalibrated
Color Space Data                : CMYK
Device Attributes               : Reflective, Glossy, Positive, Color

Mac Preview also reports CMYK for the PTIFF. Also, the thumbnail image appears lighter than the PTIFF, probably because of the sRGB conversion that's occurring in Cantaloupe when the IIIF image is generated. I'd suggest work on fixing the CMYK->RGB conversion in the PTIFF shell script.

mikeapp commented 1 year ago

Running the vips icc_transform command in the shell script against 16539363.tif resulted in an error. The script is designed to output the error but continue processing, so that's why we still see CMYK in the PTIFF.

I was able to convert the color space by a two step operation:

vips icc_transform 16539363.tif  test.tif[compression=none,strip] sRGB.icc --input-profile=cmyk
vips icc_transform test.tif test2.tif  sRGB.icc --embedded --intent perceptual --input-profile sRGB.icc

The first command converts the image from CMYK to sRGB but does not embed the profile. The second command - which is the same one currently in the shell script - successfully embeds the profile, I assume because the source is already sRGB. So we probably need a if [ ${CHANNELS} = "cmyk" ]; then block that runs the first line.

I am splitting this off to #2597 since I don't know whether this will resolve the PDF generation failure identified in this issue.