mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.17k stars 9.82k forks source link

transposed JBIG2 text segments with non-topleft reference corner don't render correctly #17883

Open nico opened 3 months ago

nico commented 3 months ago

Attach (recommended) or Link to PDF file here:

symbol-texttranspose.pdf

symbol-topright-transposed.pdf

symbol-bottomleft-transposed.pdf

symbol-bottomright-transposed.pdf

Configuration:

Steps to reproduce the problem:

  1. Open each of the four PDFs above

What is the expected behavior? (add screenshot)

They should all look like the first one:

image

What went wrong? (add screenshot)

The ones that have the reference corner not set to topleft are in various states of disarray:

image image image

ITU-T_T_88__08_2018.pdf 6.4.5 Decoding the text region has two steps for updating cur_s, once in vi) Update CURS as follows: before drawing the bitmap, and then again xi) Update CURS as follows: after drawing the bitmap. It looks like 25f6a0c13965c5ad9cebe701e4752bde5e8fa811 mixes up these two steps with the "is transposed" check. Depending on the reference corner, this needs to happen before or after drawing for both transposed and untransposed iamges.

Like in #17871: I made these files myself while writing a JBIG2 decoder. I'm reasonably confident that the files and Chrome and jbig2dec and my decoder are correct, but it's possible the files are wrong instead.

nico commented 3 months ago

Oh, and this isn't purely theoretical: This slightly-more-real-world PDF looks wonky because of this. transpose2.pdf

It's not fully real-world since it's 042_19.jb2 from https://git.ghostscript.com/?p=tests.git;a=tree;f=jbig2;h=8a7abaf842435e204c1ff1dbeed10826bf24afe6;hb=HEAD wrapped in a PDF, so it's still a bit synthetic. But it's a file made by someone else at least, which maybe gives the bug report more credence.