mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.34k stars 9.97k forks source link

Problem displaying PDF file with embedded fonts #9949

Closed fschutt closed 6 years ago

fschutt commented 6 years ago

I am working on the PDF library printpdf and noticed that Firefox doesn't display the fonts example correctly.

This could be a bug in pdf.js or a bug in my library. However, all other PDF viewers I've tested so far display the file correctly:

Evince:

image

Google Chrome:

image

Adobe Reader:

image

Foxit Reader:

image

xpdf:

image

Sumatra PDF:

image

... but Firefox:

image

Here is the corrupt PDF file: test_fonts.pdf

The code that generated the PDF file is here: https://github.com/fschutt/printpdf/blob/master/examples/fonts.rs - if you want to reproduce this from source, clone the repository and run cargo run --example fonts.

I do get an error in the console, but it's very non-descriptive:

util.js:29 Warning: Error during font loading: Cannot read property 'length' of undefined

Configuration:

Steps to reproduce the problem:

  1. Set up the build system as in the README
  2. Open the test_fonts.pdf file

I've tried to debug this error. Again, it could be that something is wrong with my code, but I don't know Javascript too well to debug the problem. So far my notes were:

- message_handler.js:123 - "action[0].call(action[1], data.data);": action[1] is undefined for whatever reason, data.data is null
- worker.js:685 - "RenderPageRequest" - I think this is where the crash happens
- worker.js:686 - "page" variable seems to have an empty font cache - where are the fonts loaded?
- worker.js:686: It seems that page.getOperatorList is not executed correctly

The example itself embeds a font + a Unicode dictionary. The respective code for doing so is here, maybe a PDF expert can tell me what's wrong with this. Adobe Acrobat doesn't throw any errors aside from complaining about layers - the PDF validates for PDF/X3:2002 (except that it uses layers).

I could debug this further, but I find the architecture of pdf.js very complex and I am not very good in debugging JavaScript, so I wanted to ask if anyone could help on why this example doesn't work. Thanks in advance.

fschutt commented 6 years ago

@jrmuizel pointed out that the PDF had an incorrect embedded CMap description. I've fixed this, but it still doesn't fix the rendering issue.

Here is the "fixed" PDF (with correct CMap descriptions): test_fonts.pdf

However, I now get a different error message in the console:

Warning: Error during font loading: dict is undefined util.js:29:7
brendandahl commented 6 years ago

Looks like the font is marked as CIDFontType0 in the font dictionary, but the font is really a true type font.

PDF.js could handle this more gracefully by either falling back to a system font or we need to add a heuristic and look into the font data to see if it is really a CFF font or a true type font.

fschutt commented 6 years ago

Okay, marking the font as CIDFontType2 instead on CIDFontType0 resolves this. I mean, pdf.js is technically correct about this, it is technically a broken PDF. I'll fix this in my library - all other PDF viewers seem to be a bit more lenient about what they accept.

timvandermeij commented 6 years ago

Thank you for fixing it in the library. The two pull requests above also make the bad PDF file render so we're behaving the same as other viewers.