mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.34k stars 9.97k forks source link

'JPX: Out of packet' when reading J2k file with more than one precinct per resolution level #5475

Closed MaMazav closed 9 years ago

MaMazav commented 9 years ago

When I'm trying to read a jpeg2000 file with more than one precinct per level, I have an 'Out of packet' exception and the picture get messed.

Link to j2k file for reproducing (EDIT: This is the wrong file): https://www.dropbox.com/s/fi1bwc73a3ynawv/cabs_1tile_bw_puniform_2levels_lrcp.jp2?dl=0

EDIT: Here is the correct file: https://www.dropbox.com/s/vekbn3m4nkhmv3o/cabs_1tile_bw_puniform_lrcp.jp2?dl=0

(I'm using JPX.js as a standalone J2k reader, so the reproduce example is not a pdf but a jp2 file. Sorry for that):

I'll be happy to contribute if I succeed to solve the bug, but I'm not sure I'll succeed in solving it. However I have some technical information I collected from debugging, which may be useful: I saw that the offsets of the beginning of the packets is wrong. The offset get wrong after parsing the first precinct of the second level. I did it by creating a file which is identical to the one that demonstrating the bug, but with PLT information. Using the PLT information I know that the packet offsets should be as follows (offsets are from the beginning of the bitstream, right after the SOD marker): level 0: 0, 14953, 29404, 43500 level 1: 58852, 85548, 109067, 132132 end of bitstream: 160939 However when debugging JPX I saw the following offsets (by watching the value of position parameter at the first line of the while loop in parseTilePackets function. Can be reproduced by a breakpoint at line 830 in 5.11.2014 version): level 0: 0, 14953, 29404, 43500 level 1: 58852, 78779, ... (the continuation is not important as the parsing is wrong from now because of the wrong offset).

fkaelberer commented 9 years ago

~~Here's a pdf file containing the above image (created with Acrobat 8): https://www.dropbox.com/s/wnkhdtqgpduk5fl/cabs_1tile_bw_puniform_lrcp.pdf?dl=0~~

Using the pdf file, I didn't have any problems viewing the image (using the PDF.js v1.0.940 and PDF.js v1.0.277 Firefox extension). Do you have more instructions to reproduce the problem? EDIT: updated link. EDIT: replaced link by uploaded file: cabs_1tile_bw_puniform_lrcp.pdf

MaMazav commented 9 years ago

I'm using this code for debugging and to show the picture in the browser: https://www.dropbox.com/s/4y3d9qvc4fm14ka/pdfjs.html?dl=0 https://www.dropbox.com/s/b50z8xm80gnh1ka/pdfjs.js?dl=0

This is an ugly code that was ported from the example code of a library called j2k.js. I put these files in the following directory structure: ./ ./jpeg2000tries ./jpeg2000tries/pdfjs.html ./jpeg2000tries/pdfjs.js ./jpeg2000tries/cabs_1tile_bw_puniform_lrcp.jp2 ./pdf.js-master ./pdf.js-master/src ...

Double clicking pdfjs.html within that directory structure enables reproducing and debugging JPX.js as a standalone.

I'll try to investigate the content in the PDF you have uploaded. I'm not really familiar with embedding j2k within pdf (as I said, I'm using JPX.js as a standalone for reading jpeg2000). Maybe the codestream is changed by Adobe when embedded within a PDF?

MaMazav commented 9 years ago

EDIT: Ignore this comment, it's wrong.

I checked the differences between the two files using hex editor. My suggestion was right - the content was changed when embedded within the document. More specific, Adobe added a decomposition level and changed all levels to contain only one precinct - a case in which this bug cannot be reproduced.

MaMazav commented 9 years ago

Sorry, I uploaded the wrong image. I'm really sorry about wasting your time... Actually Adobe leaved the codestream as is.

Here is a link for the correct j2k file in which the bug is reproduced: https://www.dropbox.com/s/vekbn3m4nkhmv3o/cabs_1tile_bw_puniform_lrcp.jp2?dl=0

fkaelberer commented 9 years ago

Ok, now I see the issue. I updated the link to the pdf file in my https://github.com/mozilla/pdf.js/issues/5475#issuecomment-61981216 above.

plroit commented 9 years ago

Hi, I noticed that the calculation of the precinct number for each code block is possibly inverted. In file: jpx.js function buildCodeblocks(context, subband, dimensions) line:

precinctNumber = pj + pi * precinctParameters.numprecinctswide;

Perhaps it should be:

precinctNumber = pi + (pj * precinctParameters.numprecinctswide);

since PJ is the precinct counter in the vertical axis, while PI is in the horizontal axis.

This simple substituion has solved the problem for a toy image with 0 DWT levels (single LL subband), 4 precincts, B/W, 1 quality layer, 4 jpeg2000 packets overall. Will try several more images.

plroit commented 9 years ago

I think I found the problem and have a ready fix.

There are two issues here, first is the PI/PJ inverted counters that I have stated in the comment above.

The second, is the mapping of codeblocks to their precincts for non-degenerate precinct partition. A precinct partition divides a resolution level into rectangles. The precinct consists of codeblock groups of every subband in the resolution level. However the subband size is not the same as the resolution size, for every non-zero resolution level. Therefore we cannot map subbands' codeblocks to their precincts solely upon the sizes of the resolution level and precinct partition.

Jasper library defines these sizes as codeblock group width and height. From the JPEG2000.pdf at Jasper documentation, section K: tier-2 coding:

"Each of the resulting precinct regions is then mapped into its child subbands (if any) at the next lower resolution level. This is accomplished by using the coordinate transformation (u,v) = (ceil(x/2), ceil(y/2)) where (x,y) and (u,v) are the coordinates of a point in the LL band and child subband, respectively.... where PPx’ =(PPx for r = 0 PPx−1 for r > 0, PPy’ =(PPy for r = 0 PPy−1 for r > 0

I suggest the following changes in jpx.js: function: buildPrecincts add codeblock group width and height calculation

    var cblkGroupWidth = resolution.resLevel == 0 ? (1 << dimensions.PPx) : (1 << (dimensions.PPx - 1));
    var cblkGroupHeight = resolution.resLevel == 0 ? (1 << dimensions.PPy) : (1 << (dimensions.PPy - 1));

function: buildPackets add resLevel index to resolution object

resolution.resLevel = r;

function: buildCodeblocks replace precinct number calculation:

        // calculate precinct number
        var pi = Math.floor((codeblock.tbx0 -
                 precinctParameters.precinctXOffset) /
                 precinctParameters.cblkGroupWidth);
        var pj = Math.floor((codeblock.tby0 -
                 precinctParameters.precinctYOffset) /
                 precinctParameters.cblkGroupHeight);

That has worked for me for a real image, RGB, 2 resolution levels, 2 quality layers, and multiple precincts per resolution level.

Paul

CodingFabian commented 9 years ago

do you feel comfortable with submitting a pull request? you did some really good investigation here and deserve the credit for the change. otherwise somebody else could pick it up.

Snuffleupagus commented 9 years ago

@MaMazav Are you the author of the jp2 file, and if so can we include it in our test suite?

MaMazav commented 9 years ago

I generated it according to @plroit's instructions. I'll try to check the copyright issue and maybe will generate another file with dummy content to avoid the copyright issue.

Anyway the bug still occurs on the following file: https://www.dropbox.com/s/7xgxjmknu4by3gb/cabs_4tiles_lrcp_noplm_noedgetiles_puniform.jp2?dl=0

I think that @plroit's solution is good for a situations in which the codeblock is larger than the precinct. If the codeblocks are smaller I still get the exception. The indication for this is that JPX is executed well for a similar file with only 4 decomposition levels (instead of the 5 in the file above). When debugging I found out that with 5 levels, level 5 codeblocks are smaller than its precincts.

I tried to change @plroit's calculation of cblkGroupWidth and cblkGroupHeight to 1 << dimensions.xcb and 1 << dimensions.ycb but it doesn't work. I'm still trying to work on it.

plroit commented 9 years ago

Hi,

Thank you for your input, I have tested the new image from @MaMazav and found several more problems with the precinct number calculation. I will commit a fix shortly.

The tested images are taken from unsplash, which states at the license that:

All photos published on Unsplash are licensed under Creative Commons Zero which means you can can copy, modify, distribute and use the photos, even for commercial purposes, all without asking permission.

The presentation and collection of these images is copyrighted by Crew Labs Inc. (2014).