Closed EngineersNeedArt closed 1 year ago
Thank you for the report.
I looked at this briefly today with a colleague and we didn't quite figure out why Preview doesn't like it. We also found that iOS PDF rendering has the same issues with this PDF.
I'll try to create a few different versions of the PDF and see if they show the same problems (using CCITT instead of JBIG2 for example). All the PDFs we generate are validated against PDF/A using veraPDF, and I don't think the JBIG2 or JPEG2000 bitstreams are wrong, so I'm a bit puzzled. I'll try to get back to you soon.
@jrochkind and @EngineersNeedArt - I've created various PDFs from the item in question, I was hoping one of you would be able to try various PDFs on your PDF viewers and let me know which ones work, and which ones don't - they might not all look the same, but that doesn't matter for the purpose of this test (I hope). Testing the files would be very valuable in the blind-debugging process required with Apple Preview. The files in questions are:
normal-jbig2.pdf
normal-ccitt.pdf
grok-jbig2.pdf
grok-ccitt.pdf
jpeg-ccitt.pdf
You can find them here: https://archive.org/~merlijn/preview-debugging/
Thanks in advance.
The only one that displays the missing content is "jpeg-ccitt.pdf".
The pages (save the first page) however need a white background drawn first to look correct.
So it looks like there is an issue with JP2 on Mac OS. I downloaded the "SINGLE PAGE PROCESSED JP2 ZIP" for the same book and only two of the images opened successfully in Apple's Preview. The rest were met with an error dialog suggesting that the file was damaged.
Console shows logs like:
"PVImageContainer initWithURL:file:///Users/calhoun/Downloads/EntertainPocketCalculator_jp2/EntertainPocketCalculator_0199.jp2 failed, error = Error Domain=NSCocoaErrorDomain Code=259 "The file “EntertainPocketCalculator_0199.jp2” could not be opened." UserInfo={NSURL=file:///Users/calhoun/Downloads/EntertainPocketCalculator_jp2/EntertainPocketCalculator_0199.jp2, NSLocalizedDescription=The file “EntertainPocketCalculator_0199.jp2” could not be opened., NSLocalizedRecoverySuggestion=It may be damaged or use a file format that Preview doesn’t recognize.}"
"initialize:1291: *** invalid JPEG2000 file"
Thanks for testing - this indeed seems to suggest to me that the problem seems to be in the usage of JPEG2000 images. To clarify, the normal
and grok
files are both using JPEG2000 as foreground and background image encoding, while the jbig2
and ccitt
part of it is just the 1-bit mask encoding.
I am aware of the JPEG visual artifacts - I only added it recently to experiment with it, it's not usable in any production setting currently.
I guess next up would be for me to extract the JPEG2000 images from the PDFs, and confirm with you (or a colleague) that Preview also cannot open some of the JPEG2000 images that are embedded in the PDF.
Could you share with me how you get to these console logs, so that I can also do some digging?
Regarding this part of your comment:
So it looks like there is an issue with JP2 on Mac OS. I downloaded the "SINGLE PAGE PROCESSED JP2 ZIP" for the same book and only two of the images opened successfully in Apple's Preview. The rest were met with an error dialog suggesting that the file was damaged.
The "SINGLE PAGE PROCESSED JP2 ZIP" files aren't what is in the PDFs - those are further compressed JPEG2000s, but it's good to know that even the "SINGLE PAGE PROCESSED JP2 ZIP" files do not work for you.
I open the app "Console" on the Mac. Filter by "Preview".
Could you confirm which page is the first one that doesn't render correctly? Someone wrote on HN that pages 2-10 do not render, so perhaps we can try with page 5.
I ran this on my laptop:
pdfimages normal-jbig2.pdf -all -f 5 -l 5 jp2test
which produces jp2test-000.jp2
, jp2test-001.jp2
and jp2test-002.jb2e
.
I have uploaded these files and the files converted to PNG (so that you can visually compare) to this URL: https://archive.org/~merlijn/preview-debugging/page-5/
The JPEG2000 images will just be a mostly white image, and a mostly black image (due to the nature of the content).
The JBIG2 image (.jb2e
) will almost certainly not open in Preview regardless, since it lacks the required headers, as it is an "embedded" JBIG2 - so for the purpose of this test you can disregard the JBIG2 entirely, I've just added it (also as PNG form) so that you can see what the mask looks like.
So please check if you can open these JPEG2000 images at all in Preview, and if not, I'll try to figure out why not.
Interesting, I had heard from some colleagues that "MacOS [Preview and Safari] have trouble displaying some jpeg2000", but hadn't reproduced myself and hadn't been sure how widespread this was.
I wonder if we can figure out what aspects of the jpeg2000 are triggering a problem. All the jpeg2000 I have yet created myself, my MacOS Preview has no problem displaying.
I extracted the images from https://archive.org/download/htewypc/EntertainPocketCalculator.pdf with the pdfimages
cli util that comes with poppler. And verified that indeed a number of the images would not even open in my MacOS Preview, with a similar error message to what @EngineersNeedArt reports.
I wanted to see if Chrome could display them -- but it looks like Chrome can't actually display raw .jp2 at all, even though it can display PDFs with embedded jp2, including the embedded images! Chrome seems to have no trouble displaying all pages in the original PDF.
jp2test-000.jp2 16-May-2023 14:54 631
Does not load/display in Safari. Downloaded, does not display in Preview: same, "File may be damaged" message.
jp2test-000.jp2.png 16-May-2023 14:53 11615
(Displays only white rectangle.)
jp2test-001.jp2 16-May-2023 14:54 483
Does not load/display in Safari. Downloaded, does not display in Preview: same, "File may be damaged" message.
jp2test-001.jp2.png 16-May-2023 14:53 2294
(Displays only black rectangle.)
jp2test-002.jb2e 16-May-2023 14:54 9389
Not recognized as an image file at all. Forcing Preview to open it anyway results in nothing being opened/displayed.
jp2test-002.jb2e.png
The only one that displays correctly.
Just to be clear, seeing just a white and black rectangle in this case is entirely expected, thanks for confirming that indeed the JPEG2000s themselves, outside of the PDF, do not open in Preview. That seems to be in line with what @jrochkind just posted.
Chrome seems to have no trouble displaying all pages in the original PDF.
Perhaps they have their own "JPEG2KLib" and are not using the OS'es.
ImageIO on the Mac is typically using the open image libraries (like jpeglib, pnglib, etc.) so I am not sure why Macs would exhibit this problem.
Just a random hypothesis that may have nothing to do with anything, but as part of exploring this domain I have been looking into the complexity on embedded ICC color profiles, and discovering that there is some complexity and disagreement between tools about whether and what kinds of embedded ICC color profiles are supported by jp2k.
I am curious to see if/what kind of color profile may be embedded in these images. But I haven't yet figured out how to check that. This tool may do so, but does not offer packages for mac!
This is probably a red herring though.
I am curious to see if/what kind of color profile may be embedded in these images. But I haven't yet figured out how to check that.
Try exiftool: https://exiftool.org
"ExifTool lets you examine ICC profiles, regardless of whether they are embedded in an image or as stand-alone ".icc" files. It also lets you extract ICC profiles from images and embed them into images.
Extract: exiftool -icc_profile -b -w icc photo.jpg
Embed: exiftool "-icc_profile<=profile.icc" photo.jpg
Examine directly in an image: exiftool -icc_profile:* photo.jpg
Examine an ".icc" file: exiftool profile.icc"
I also found some hits online regarding EXIF data or ICC in the EXIF data causing trouble, but the input images do not seem to contain either an ICC Profile or EXIF data (see EntertainPocketCalculator_0005.jp2
for example). And the images generated in the PDF also probably should not contain any EXIF data (although the embedded ones seems to have a bit of a weird dpi value).
Does the original EntertainPocketCalculator_0005.jp2
image open in Preview? I think we could learn a lot if we can find some of the more 'normal' images that do not work in Preview as well. I believe @EngineersNeedArt said that only two of the "SINGLE PAGE PROCESSED JP2 ZIP" opened OK - which ones did open OK?
Decided to look at the XMP metadata using exiftool for the only known good JP2K image in the bunch (the cover image: EntertainPocketCalculator_0000.jp2
) and page 5 (known bad image: EntertainPocketCalculator_0005.jp2
).
calhoun@Johns-M1 GliderVintage % exiftool -xmp:all /Users/calhoun/Downloads/EntertainPocketCalculator_jp2/EntertainPocketCalculator_0000.jp2
XMP Toolkit : Image::ExifTool 11.88
Bits Per Sample : 8
Photometric Interpretation : RGB Palette
Planar Configuration : Chunky
Samples Per Pixel : 1
calhoun@Johns-M1 GliderVintage % exiftool -xmp:all /Users/calhoun/Downloads/EntertainPocketCalculator_jp2/EntertainPocketCalculator_0005.jp2
XMP Toolkit : Image::ExifTool 11.88
Bits Per Sample : 1
Photometric Interpretation : WhiteIsZero
Planar Configuration : Chunky
Samples Per Pixel : 1
calhoun@Johns-M1 GliderVintage %
My small sample-size test suggests 1-bit-per-sample images may be a problem?
If that is the case, that would be quite disastrous, but I am inclined to agree that this currently seems like a good assumption.
No way to generate 8-bit JPEG2K files? Or does the file size become needlessly too large?
Does the original
EntertainPocketCalculator_0005.jp2
image open in Preview?
No only the first image: EntertainPocketCalculator_0000.jp2
and the odd one: EntertainPocketCalculator_0210.jp2
The latter too is an RGB image:
calhoun@Johns-M1 GliderVintage % exiftool -xmp:all /Users/calhoun/Downloads/EntertainPocketCalculator_jp2/EntertainPocketCalculator_0210.jp2
XMP Toolkit : Image::ExifTool 11.88
Bits Per Sample : 8
Photometric Interpretation : RGB Palette
Planar Configuration : Chunky
Samples Per Pixel : 1
(edit to fix the file name for the first image)
No way to generate 8-bit JPEG2K files? Or does the file size become needlessly too large?
I will have to investigate.
In general, for this specific PDF, there is a much better way to handle most of these pages, which is to not do the MRC at all, and make the page only a single JBIG2 image per page, because we can realistically omit the MRC part altogether for monotone images. I just have not gotten round to implementing that (a heuristic based on the image). I will look later today to see what I can do about 8 bit JPEG2000 images from monotone input, but it feels like a bit of a kludge regardless.
Once we've confirmed this really is the problem, I wonder if there is a way to get Apple to fix Preview.
Yeah, Apple should have a Radar filed with a few JPEG2000 images attached that show the issue.
I only just realized in my email (which puts full names instead of just handles in the web UI) that @EngineersNeedArt is the famous John Calhoun, retired from Apple. Love your work John, thanks for the assistance!
Ha ha, no problem. And I'll reach out to some of the engineers I know within Apple about the JPEG2000 bug.
Oh, if you could reach out that would be great!
If I remember, your also wrote on HN that Adobe had similar problems with some of the pages ("Weirdly, Adobe Reader didn't like it either...."). If you're up for it, I'd like to try to figure out if the problem with Adobe is of a similar nature...
If it is, that will probably sway archive.org to change the software to either always make 8bit images in the PDFs, or for me to just write the code to skip MRC altogether if we have 1bit input images.
I would suspect Adobe on Mac OS is using ImageIO for its JPEG2K rendering.
I filed a report with Apple and they are looking into the problem with JPEG2000s on their end.
This PDF:
https://archive.org/download/htewypc/EntertainPocketCalculator.pdf
Tried Preview, Acrobat Reader — all on Mac OS Monterey (an M1 MacBook Pro FWIW).