Closed timvandermeij closed 5 months ago
I wonder which features of those images are actually supported by the pdf standard. The pdf specification says at page 87:
To promote interoperability, the specifications define a subset of JPX called JPX baseline (of which JP2 is also a subset). The complete details of the baseline set of JPX features are contained in ISO/IEC 15444-2, Information Technology—JPEG 2000 Image Coding System: Extensions. Data used in PDF image XObjects should be limited to the JPX baseline set of features, except for enumerated color space 19 (CIEJab).
Unfortunately, the JPEG2000 spec isn't freely available to check the baseline feature set. Some of the images in the first two documents have problems in Sumatra PDF as well, the first one even in Acrobat 8.
By the way, the image corpus of the second file can be found at http://www.gwg.nga.mil/ntb/baseline/software/testfile/Jpeg2000/index.htm, where they also list the jpeg2000 feature sets of the individual images and a copyright notice
I think this is related to #5727. This behavior is very similar to what I experienced. Boats.pdf and ballon.pdf both contains image with multiple quality layers. Unfortunately, it does not fix this issue.
Everything read after the bug occurs is misaligned causing the "Out of packets" error. When reading boat.pdf and ballons.pdf with PR #5727, "JPX Error: Invalid tag tree" is thrown suggesting that the inclusion tree was not built while decoding the first layer.
+1 to fix this issue(s). Is there anyone currently working in improving j2k support in pdf.js?
Not that I'm aware of. Please note that the image I posted might not be supported by the PDF standard, so there might be no need to actually support them.
Thanks. How would I find out what part of the jpeg 2000 standard is supported by PDF ? All of the images in the OpenJPEG repo are jpeg 2000 Part I images.
Hard to say because from https://github.com/mozilla/pdf.js/issues/5649#issuecomment-70172046 it appears that the baseline set is not freely available. We might need to check what other PDF readers support.
Thanks. From the standard doc:
"To promote interoperability, the specifications define a subset of JPX called JPX baseline (of which JP2 is also a subset)."
So, since all of the files in the OpenJPEG repo are jp2 files (jp2 == jpeg 2000 part I), pdf.js should support them.
Perhaps you can interest some of the OpenJPEG guys in taking a look at the issues.
Another example:
https://bugzilla.mozilla.org/show_bug.cgi?id=1695361
Warning: Unknown colorspace 12
Someone sent me a PDF with a jpx image (according to pdfimages -list
) that renders repeated and pixelated in Thunderbird and Firefox 97.0a1. It also appears the same in the Gwenview and Okular viewers in Fedora 35, and doesn't appear at all in LibreOffice Writer and Inkscape (using either Poppler/Cairo or internal import).
In the Firefox browser console, I see
PDF ca5ebce36b1609665d1737484e8a8020 [1.4 macOS Version 12.0.1 (Build 21A559) Quartz PDFContext / Pages] (PDF.js: 2.12.248) viewer.js:1508:13
Warning: Unsupported header type 1667523942 (cdef). pdf.worker.js:1098:13
suggesting the user made the document in the MacOS Pages app.
The warning is the same as one of the warnings in the second file in this bug. Here's what it looks like in Thunderbird:
Kinda cool :wink:, but not what it looks like if I extract the image with pdfimages -jp2
and convert with ImageMagick. I can't supply the original PDF but if you need the extracted image let me know.
I'm not finding similar bug reports for Okular and Gwenview. Does the similar garbled appearance mean the problem is in a shared library?
Another JPX that fails to be rendered on PDF.js 2.16.75. Works well on Preview.app and all other tested PDF viewers.
wu-89008118317-11-1660897219.pdf.
[Log] Warning: Dependent image isn't ready yet (pdf.js, line 456)
[Log] Warning: Unsupported header type 1970628964 (uuid). (pdf.worker.js, line 1150)
[Log] Warning: JPX: Unsupported COD options (terminationOnEachCodingPass, verticallyStripe, predictableTermination). (pdf.worker.js, line 1150)
[Log] Warning: Unable to decode image "img_p0_1": "JpxError: JPX error: JPX error: Out of packets". (pdf.worker.js, line 1150)
Unfortunately none of the PDF links in https://github.com/mozilla/pdf.js/issues/5649#issue-54388774 work now, and given https://github.com/mozilla/pdf.js/issues/5649#issuecomment-70172046 it's not clear if those JPEG2000 images are actually supported when used in PDF files.
@timvandermeij Given the points above, and the age of this issue, should we perhaps close this one now? Please note that we've a number of, maybe slightly more actionable, JPEG2000 issues in the https://github.com/mozilla/pdf.js/labels/image-jpx category.
Unfortunately none of the PDF links in #5649 (comment) work now
Here's an easy way to reproduce the issue:
opj_compress -i test.png -o test.jp2
. Because the input file has alpha, OpenJPEG will generate an cdef
header.In this case, the transparent pixels are just shown as black, but I've seen cases where the image is missing entirely.
Files involved: samples.zip
given #5649 (comment) it's not clear if those JPEG2000 images are actually supported when used in PDF files.
Regarding whether the cdef
header is actually included by the PDF standard, I've done some research. According to the format description of JPEG 2000 Part 1 summarized by the Library of Congress: (emphasis mine)
Full name: ISO/IEC 15444-1:2016. Information technology -- JPEG 2000 image coding system -- Part 1: Core coding system, Annex I: JP2 file format syntax (formal name)
Color maintenance: Rich support, further extended in JPX. In JP2_FF, the color space of the decompressed image data is indicated in the Color Specification box inside the JP2 Header box…… For palettized images, ……the Component Mapping box defines which codestream components map to which palette components or bypass the palette…… Finally, the Channel Definition box maps codestream components (if unpalettized) or channels to color components, allowing them to be permuted if desired and enabling support for alpha channels (opacity) as well as color channels.
History: ISO/IEC 15444-2:2004. Information technology -- JPEG 2000 image coding system: Extensions. Defines a set of lossless (bit-preserving) and lossy compression methods for coding continuous-tone, bi-level, grey-scale, colour digital still images, or multi-component images;
The "Component Definition box" mentioned in the document is the cdef
header, according to OpenJPEG source code:
#define JP2_JP 0x6a502020 /**< JPEG 2000 signature box */
#define JP2_FTYP 0x66747970 /**< File type box */
#define JP2_JP2H 0x6a703268 /**< JP2 header box (super-box) */
#define JP2_IHDR 0x69686472 /**< Image header box */
#define JP2_COLR 0x636f6c72 /**< Colour specification box */
/* ... */
#define JP2_CMAP 0x636d6170 /**< Component Mapping box */
#define JP2_CDEF 0x63646566 /**< Channel Definition box */
/* ... */
Also, the document mentioned that ISO/IEC 15444-2 supports multi-component images. And in the PDF standard excerpted above:
To promote interoperability, the specifications define a subset of JPX called JPX baseline (of which JP2 is also a subset). The complete details of the baseline set of JPX features are contained in ISO/IEC 15444-2, Information Technology—JPEG 2000 Image Coding System: Extensions.
Considering all of those secondary evidence together, I do feel they make a strong case for the standardized inclusion of at least some of the unimplemented features reported in this issue.
Closing since most files aren't available anymore and the ones that are render fine now, most likely thanks to #17946. Moreover, given that we delegate JPX parsing to OpenJPEG now we can also be sure that any spec-compliant images will be handled properly now.
While testing various .jp2 files, I have come across numerous images that are not rendered correctly by PDF.js, mainly because of unimplemented header types. I'm listing the broken PDFs along with the source of their images below.
First file
PDF https://pdf.yt/d/fSMs0rOfBIjacwHv
Source Images taken from https://code.google.com/p/openjpeg/source/browse/data/input/conformance/?r=2833, files
subsampling_1.jp2
andsubsampling_2.jp2
License See https://code.google.com/p/openjpeg/source/browse/data/input/conformance/COPYRIGHT?r=2833. I don't think it's permissive enough to include them in the repository.
Console output
Second file
PDF https://pdf.yt/d/Zd4e1zB9oGGPBO85
Source Images taken from https://code.google.com/p/openjpeg/source/browse/data/input/conformance/?r=2833, files
file1.jp2
tofile9.jp2
License See https://code.google.com/p/openjpeg/source/browse/data/input/conformance/COPYRIGHT?r=2833. I don't think it's permissive enough to include them in the repository.
Console output
Third file
PDF https://pdf.yt/d/heg0slSRtdEA-vqi
Source Image taken from http://opf-labs.org/format-corpus/jp2k-formats/, file
balloon.jp2
License From http://opf-labs.org/format-corpus/jp2k-formats/readme.md: created from https://commons.wikimedia.org/wiki/File:1783_balloonj.jpg, which is in public domain, so we can include this file in the repository.
Console output