mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.53k stars 9.98k forks source link

Fonts/text are rendered inconsistently in Santa Cruz County COVID-19 shelter-in-place PDF #11840

Closed dholbert closed 1 year ago

dholbert commented 4 years ago

Attach (recommended) or Link to PDF file here:

Configuration:

Steps to reproduce the problem:

  1. View the attached/linked PDF
  2. Look at the font & characters in the page's content (below the header) -- e.g. the section "ORDER OF THE HEALTH OFFICER OF THE COUNTY OF SANTA CRUZ DIRECTING [...]"
  3. Compare to a standalone PDF viewer.

What is the expected behavior? (add screenshot)

What went wrong? (add screenshot)

timvandermeij commented 4 years ago

I can't reproduce this using Arch Linux, Firefox 75 and the latest viewer at https://mozilla.github.io/pdf.js/web/viewer.html, however this is most likely because I have installed more fonts. The PDF file does not embed all its fonts, so this is most likely caused by a fallback font being used.

dholbert commented 4 years ago

Thanks! For an apples-to-apples comparison: I just checked, and I am able to reproduce this on my system, when using the latest viewer at https://mozilla.github.io/pdf.js/web/viewer.html , in both Firefox 75 and Firefox Nightly. So: I suspect you're right about this being a difference in system fonts.

Having said that: do you know why things would still look better in my native PDF viewer, evince, though? (I'm assuming it has the same system fonts available as Firefox does, though maybe it has its own special store of backup/PDF-depending fonts...)

Also: as one additional bit of data, Firefox's devtools says that our rendering of the PDF only uses 3 fonts, for me (though I'm not sure I trust this is accurate[1]): DejaVu Sans, DejaVu Sans Mono, and Ubuntu: image

[1] (I'm not sure I trust our devtools-reported fonts here because both "AN" and "D" are reported as using DejaVu Sans, when clearly "D" is actually drawn with serifs, as shown in my initial screenshot. It looks like the reported fonts are for PDF.js's transparent "text layer" which is separate from the actual painted characters in the "canvas layer"; so I don't know if these text layer fonts are relevant here or not.

timvandermeij commented 4 years ago

I'm not too familiar with the font conversion code, but the PDF specification describes 14 standard fonts that PDF viewers may use for font substitution. I think that's what Evince (or Okular in my case) use. Refer to #11637 for more information. Basically PDF.js requires fonts to be embedded, or it will use a fallback font.

dholbert commented 4 years ago

Makes sense. Thanks!

marco-c commented 1 year ago

16363 might have fixed this, can you still reproduce in latest Firefox Nightly?

dholbert commented 1 year ago

Can't reproduce in latest Firefox Nightly, no. Here's a screenshot of what I see now for the bad sections shown at the end of my initial comment here: image

The font looks to be consistently the same serif font now.

dholbert commented 1 year ago

Hmm, I actually also can't reproduce when using Firefox Nightly 2020-04-21 (the version that I was testing when I reported this bug), launched via mozregression.

I'm on Ubuntu 22.04 now, on a different machine from the machine that I used to file this bug. (I don't have the original system anymore.) Not sure offhand what the variable might be that's preventing me from reproducing; maybe I've got different fonts available now, or there was an Ubuntu bug-fix that influenced how we choose fonts, or something else.

In any case, since I can't reproduce, I can't be entirely sure whether the bug is still present or not. I'm happy to have this closed as no-longer-reproducible (and maybe even fixed) if that makes sense.

marco-c commented 1 year ago

Thanks!