Open acdha opened 5 years ago
@acdha , the National Gazette shows three pages, The 'Text | PDF | JP2 (2.5 MB)' JP2 file shows 3 pages with OPENJPEG. Why should the National Gazette be wrong and KDU be right? winfried
@szukw000 That issue had 4 pages, not three, and note that it's displaying the same page 3 times rather than 3 separate pages (here's what an unaffected batch looks like: https://chroniclingamerica.loc.gov/lccn/sn91066782/1919-08-05/ed-1/seq-1/).
This was reported to me as a bug but it's specific to certain batches which apparently slipped through quality-review because the 4 JP2 files in that issue all decode to the expected single page each using Photoshop, Kakadu, and Aware.
@acdha , the file 0004.jp2 has an ICC profile. Excluding to use this ICC profile:
OPJ_DECOMPRESS:1609: icc_profile_len(560)
I got a one image result. Must look next day what is going wrong. winfried
If memory serves, we ran into this with an older Kakadu version as well. I’ll check the history tomorrow.
I thought this seemed familiar: this was also a problem with Kakadu versions prior to 7.10 as well, which involved a different file from a similar batch (https://cdn.loc.gov/service/sgp/sgpbatches/batch_dlc_germanrex_ver01/data/sn83025887/print/1791103101/0001.jp2). At the time the developers provided a little background information:
The file you sent us contains a malformed channel-definition (cdef) box that describes only one channel for a 3 colour image, without specifying that the single channel description should be broadcast to all other channels. As a result, insufficient channels were defined. The effect of this depended upon whether the channels were needed for particular operations.
I have remedied the problem by determining that a single channel description should be interpreted as one that broadcasts to other channels if there are any, unless explicit broadcast information is provided in the CDEF box.
@acdha , file 0004.jp2 contains the following ICC profile:
-------------- ICC Profile[560] ---------------
cmmtype(0x41444245) version(2.16.0.0) class(mntr)
color-space(RGB ) profile-connection-space(XYZ )
date(11.8.2000) time(19:51:59)
signature(acsp) platform-signature(APPL)
profile-flags(0) [0]0 [1]0 [2:15]0
device-attr(0) white(0) negative(0) matte(0) transparent(0)
rendering-intent(0)
CIE: x(0.63190) y(1.0) z(0.54061)
[0]signature(cprt) size[50]
Copyright 2000 Adobe Systems Incorporated
[1]signature(desc) size[107]
DESC(Adobe RGB (1998))
[2]signature(wtpt) size[20]
XYZ(0.81, 0.81, 0.81)
[3]signature(bkpt) size[20]
XYZ(0.0, 0.0, 0.0)
[4]signature(rTRC) size[14]
curv size[1]
curv[0]gamma(2.51)
[5]signature(gTRC) size[14]
curv size[1]
curv[0]gamma(2.51)
[6]signature(bTRC) size[14]
curv size[1]
curv[0]gamma(2.51)
[7]signature(rXYZ) size[20]
XYZ(0.24, 0.24, 0.24)
[8]signature(gXYZ) size[20]
XYZ(0.141, 0.141, 0.141)
[9]signature(bXYZ) size[20]
XYZ(0.49, 0.49, 0.49)
The 'color-space(RGB )' needs image->numcomps > 2 for RGB or RGBA.
ICC handling is done with LCMS2/LCMS1 in 'common/color.c':
color.c:491: max_w(4008) max_h(5265) prec(8) numcomps(1)
color.c:496:cmsSigRgbData nr_comp(1)
color.c:572:color_apply_icc_profile
channels(1) prec(8) w(4008) h(5265)
profile: in(0x1a01260) out(0x1a3c3f0)
render_intent (0)
color_space: in(0x58595a20)(XYZ ) out:(0x52474220)(RGB )
type: in(262169) out:(262169)
'cmsSigRgbData nr_comp(1)' is impossible. I applied the following patch:
--- color.c.orig 2019-08-08 00:45:01.903651387 +0200
+++ color.c 2019-08-08 00:39:24.442674016 +0200
@@ -488,6 +488,10 @@
if (out_space == cmsSigRgbData) { /* enumCS 16 */
unsigned int i, nr_comp = image->numcomps;
+ if (nr_comp < 3) { /* GRAY or GRAYA, not RGB or RGBA */
+ cmsCloseProfile(in_prof);
+ return;
+ }
if (nr_comp > 4) {
nr_comp = 4;
}
winfried
@acdha , any objections against this small patch? If not: I would like to send this patch, pointing to #1207 and the files 0004.jp2 and 0001.jp2 . winfried
I have not tested it yet — I was planning to work on that shortly.
@szukw000 That appears to have fixed the problem — I still need to purge some caches but if you look at e.g. https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-26/ed-1/seq-3/ you can see images from the affected batches are now decoding correctly — e.g. https://chroniclingamerica.loc.gov/iiif/2/dlc_germanrex_ver01%2Fdata%2Fsn83025887%2Fprint%2F1791122201%2F0004.jp2/full/!1024,1024/0/default.jpg
My updated Cantaloupe server build is here: https://github.com/acdha/docker-cantaloupe/commit/c4d7a42935b5435880957fd28f518f0de9a3d354
@rouault , I have sent a PR (#1209) but it has been refused by I-do-not-know. But it is important. winfried
What's the path look like for getting this merged? It's a major parity point for OpenJPEG relative to other codecs and it'd be nice to eventually be able to stop maintaining a fork.
but it has been refused by I-do-not-know.
@szukw000 GitHub indicates that yourself closed the pull request :-) Just reopen it
@ValZapod no argument that the file is malformed. The problem is that the people using these files will see that OpenJPEG doesn’t produce the same result as most of the other tools and say that they can’t use OpenJPEG. So it’d be totally reasonable to display a warning or something but not handling the file limits adoption, and since some old codecs produced these incorrect files under normal usage they’re not uncommon.
While troubleshooting an odd image display on https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-22/ed-1/seq-4/ I noticed some inconsistencies between codecs handling the file, available directly as https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-22/ed-1/seq-4.jp2
If I take the same image and run it through kdu_transcode, it produces the expected image: