uclouvain / openjpeg

Official repository of the OpenJPEG project
Other
976 stars 456 forks source link

Inconsistent decoding causing duplicate image display #1207

Open acdha opened 5 years ago

acdha commented 5 years ago

While troubleshooting an odd image display on https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-22/ed-1/seq-4/ I noticed some inconsistencies between codecs handling the file, available directly as https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-22/ed-1/seq-4.jp2

If I take the same image and run it through kdu_transcode, it produces the expected image:

$ kdu_transcode -i 0004.jp2 -o 0004-transcode.j2k
$ opj_decompress -i 0004-transcode.j2k -o 0004-transcode.png
szukw000 commented 5 years ago

@acdha , the National Gazette shows three pages, The 'Text | PDF | JP2 (2.5 MB)' JP2 file shows 3 pages with OPENJPEG. Why should the National Gazette be wrong and KDU be right? winfried

acdha commented 5 years ago

@szukw000 That issue had 4 pages, not three, and note that it's displaying the same page 3 times rather than 3 separate pages (here's what an unaffected batch looks like: https://chroniclingamerica.loc.gov/lccn/sn91066782/1919-08-05/ed-1/seq-1/).

This was reported to me as a bug but it's specific to certain batches which apparently slipped through quality-review because the 4 JP2 files in that issue all decode to the expected single page each using Photoshop, Kakadu, and Aware.

szukw000 commented 5 years ago

@acdha , the file 0004.jp2 has an ICC profile. Excluding to use this ICC profile:

OPJ_DECOMPRESS:1609: icc_profile_len(560)

I got a one image result. Must look next day what is going wrong. winfried

acdha commented 5 years ago

If memory serves, we ran into this with an older Kakadu version as well. I’ll check the history tomorrow.

acdha commented 5 years ago

I thought this seemed familiar: this was also a problem with Kakadu versions prior to 7.10 as well, which involved a different file from a similar batch (https://cdn.loc.gov/service/sgp/sgpbatches/batch_dlc_germanrex_ver01/data/sn83025887/print/1791103101/0001.jp2). At the time the developers provided a little background information:

The file you sent us contains a malformed channel-definition (cdef) box that describes only one channel for a 3 colour image, without specifying that the single channel description should be broadcast to all other channels. As a result, insufficient channels were defined. The effect of this depended upon whether the channels were needed for particular operations.

I have remedied the problem by determining that a single channel description should be interpreted as one that broadcasts to other channels if there are any, unless explicit broadcast information is provided in the CDEF box.

szukw000 commented 5 years ago

@acdha , file 0004.jp2 contains the following ICC profile:

-------------- ICC Profile[560] ---------------
cmmtype(0x41444245) version(2.16.0.0) class(mntr)
color-space(RGB ) profile-connection-space(XYZ )
date(11.8.2000) time(19:51:59)
signature(acsp) platform-signature(APPL)
profile-flags(0) [0]0 [1]0 [2:15]0
device-attr(0) white(0) negative(0) matte(0) transparent(0)
rendering-intent(0)
CIE: x(0.63190) y(1.0) z(0.54061)
[0]signature(cprt) size[50]
Copyright 2000 Adobe Systems Incorporated
[1]signature(desc) size[107]
    DESC(Adobe RGB (1998))
[2]signature(wtpt) size[20]
  XYZ(0.81, 0.81, 0.81)
[3]signature(bkpt) size[20]
  XYZ(0.0, 0.0, 0.0)
[4]signature(rTRC) size[14]
  curv size[1]
    curv[0]gamma(2.51)

[5]signature(gTRC) size[14]
  curv size[1]
    curv[0]gamma(2.51)

[6]signature(bTRC) size[14]
  curv size[1]
    curv[0]gamma(2.51)

[7]signature(rXYZ) size[20]
  XYZ(0.24, 0.24, 0.24)
[8]signature(gXYZ) size[20]
  XYZ(0.141, 0.141, 0.141)
[9]signature(bXYZ) size[20]
  XYZ(0.49, 0.49, 0.49)

The 'color-space(RGB )' needs image->numcomps > 2 for RGB or RGBA.

ICC handling is done with LCMS2/LCMS1 in 'common/color.c':

color.c:491: max_w(4008) max_h(5265) prec(8) numcomps(1)
color.c:496:cmsSigRgbData nr_comp(1)
color.c:572:color_apply_icc_profile
    channels(1) prec(8) w(4008) h(5265)
    profile: in(0x1a01260) out(0x1a3c3f0)
    render_intent (0)
    color_space: in(0x58595a20)(XYZ )   out:(0x52474220)(RGB )
           type: in(262169)              out:(262169)

'cmsSigRgbData nr_comp(1)' is impossible. I applied the following patch:


--- color.c.orig    2019-08-08 00:45:01.903651387 +0200
+++ color.c 2019-08-08 00:39:24.442674016 +0200
@@ -488,6 +488,10 @@
     if (out_space == cmsSigRgbData) { /* enumCS 16 */
         unsigned int i, nr_comp = image->numcomps;

+        if (nr_comp < 3) { /* GRAY or GRAYA, not RGB or RGBA */
+            cmsCloseProfile(in_prof);
+            return;
+        }
         if (nr_comp > 4) {
             nr_comp = 4;
         }

winfried

szukw000 commented 5 years ago

@acdha , any objections against this small patch? If not: I would like to send this patch, pointing to #1207 and the files 0004.jp2 and 0001.jp2 . winfried

acdha commented 5 years ago

I have not tested it yet — I was planning to work on that shortly.

acdha commented 5 years ago

@szukw000 That appears to have fixed the problem — I still need to purge some caches but if you look at e.g. https://chroniclingamerica.loc.gov/lccn/sn83025887/1791-12-26/ed-1/seq-3/ you can see images from the affected batches are now decoding correctly — e.g. https://chroniclingamerica.loc.gov/iiif/2/dlc_germanrex_ver01%2Fdata%2Fsn83025887%2Fprint%2F1791122201%2F0004.jp2/full/!1024,1024/0/default.jpg

My updated Cantaloupe server build is here: https://github.com/acdha/docker-cantaloupe/commit/c4d7a42935b5435880957fd28f518f0de9a3d354

szukw000 commented 5 years ago

@rouault , I have sent a PR (#1209) but it has been refused by I-do-not-know. But it is important. winfried

acdha commented 4 years ago

What's the path look like for getting this merged? It's a major parity point for OpenJPEG relative to other codecs and it'd be nice to eventually be able to stop maintaining a fork.

rouault commented 4 years ago

but it has been refused by I-do-not-know.

@szukw000 GitHub indicates that yourself closed the pull request :-) Just reopen it

acdha commented 3 years ago

@ValZapod no argument that the file is malformed. The problem is that the people using these files will see that OpenJPEG doesn’t produce the same result as most of the other tools and say that they can’t use OpenJPEG. So it’d be totally reasonable to display a warning or something but not handling the file limits adoption, and since some old codecs produced these incorrect files under normal usage they’re not uncommon.