mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
47.2k stars 9.82k forks source link

Font rendering issue in viewer demo #10175

Open DanDeMicco opened 5 years ago

DanDeMicco commented 5 years ago

Attach (recommended) or Link to PDF file here: Test_page (3).pdf

Configuration:

Steps to reproduce the problem:

  1. download the file from the link above
  2. open with https://mozilla.github.io/pdf.js/web/viewer.html
  3. Notice font rendering is messed up

What is the expected behavior? (add screenshot) PDF renders text correctly

What went wrong? (add screenshot) Fonts render incorrectly

screen shot 2018-10-23 at 11 33 29 am

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

DanDeMicco commented 5 years ago

I downloaded a few releases (including latest and a few older ones) from https://github.com/mozilla/pdf.js/releases, ran the viewer locally and opened the same PDF and found the fonts rendered fine for whatever reason

DanDeMicco commented 5 years ago

Following the font debugging steps, I extracted one of the fonts to an .otf, and tried to install it on my machine and got some warnings.

XeQCUKSY.otf.zip

screen shot 2018-10-23 at 11 53 05 am

Any help or recommendations on how to debug this further? Is the font in the PDF simply corrupt?

janpe2 commented 5 years ago

The link to the PDF file doesn't work or it requires a login.

DanDeMicco commented 5 years ago

Hey @janpe2 sorry, looks like the permissions got edited. Here is a new link to the PDF: https://farmersinsurance.ent.box.com/s/46dz61y6tdjpv3z1l6zysbt59o29hdom

timvandermeij commented 5 years ago

The file is gone again. Could you attach it to the issue instead of linking it to prevent it from breaking again? You can drag-and-drop it in the comments field on GitHub.

janpe2 commented 5 years ago

I have the file issue10175.pdf

Here is my analysis. The font contains some errors in the CFF data. For example:

tx -dump -6 -g 2 XeQCUKSY.otf
## glyph[tag] {name,encoding,path}
glyph[2] {.notdef,U+E002,
  98 718 vmoveto
  98 hlineto
  -271 vlineto
  -98 hlineto
  98 180 271 rmoveto
  98 hlineto
  -271 vlineto
  -98 hlineto
  endchar}

The rmoveto operator should have two arguments but it has three. This is why OpenType Sanitiser rejects the font. There are many glyphs that have the same problem with the rmoveto operator.

THausherr commented 5 years ago

@janpe2 thanks for the analysis. Adobe, Chrome and Edge display the file properly, PDFBox fails, but succeeds when I take the two last parameters instead of the two first parameters.

DanDeMicco commented 5 years ago

Hi all, attaching the PDF source to the main issue (although it doesn't appear needed anymore). I didn't have permission until now to attach the PDF directly.

@janpe2 we seem to have a decent amount of font issues with PDF's. Do you have any further advice for debugging font issues other than what is present in the debugging FAQ? I ran the tx commands, but I wasn't exactly sure what to look for.

Something is definitely wrong with the font, but for some reason chromium and adobe is able to display the PDF correctly. Am wondering what is the plan? Is it possible to display the fonts even if it fails the sanitizer?

Appreciate you all looking into this!

DanDeMicco commented 5 years ago

I did some investigation and I think this might be a regression.

I had an older version of pdfjs-2.0.550-dist which loads the PDF correctly. Latest version (pdfjs-2.0.943-dist) has incorrect text