mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.53k stars 9.98k forks source link

[Bug]: Incorrect rendering of texts in a PDF #18801

Closed mathumal07 closed 1 month ago

mathumal07 commented 1 month ago

Attach (recommended) or Link to PDF file

Uploaded in comments

Web browser and its version

Firefox : 130.0.1

Operating system and its version

any OS, tested in Windows

PDF.js version

3.11

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

Open the sample file provided in pdf.js viewer, we can see the text replaced with some other characters.

What is the expected behavior?

PDF text fields should be displayed with correct text

Screenshot 2024-09-26 at 7 40 03 PM

What went wrong?

Texts in PDF is replaced with some different characters.

Screenshot 2024-09-26 at 7 39 16 PM

Link to a viewer

No response

Additional context

No response

lbadri commented 1 month ago

Test_FontIssue 2.pdf

lbadri commented 1 month ago

Problem file uploaded.

Snuffleupagus commented 1 month ago

Attach (recommended) or Link to PDF file

Uploaded in comments

As previously mentioned, please attach the document directly here when opening the issue.

(Keep in mind that this is an open-source project that you're able to use for free, hence please respect other peoples time.)

PDF.js version

3.11

That's neither a complete version number, nor a supported version; please find the latest releases at https://mozilla.github.io/pdf.js/getting_started/#download

Open the sample file provided in pdf.js viewer, we can see the text replaced with some other characters.

That's because your PDF document uses non-standard fonts, but don't embed them, which is a bug in the PDF document itself. Note the following console log:

PDF 8adbff7927d8fe4c83f82dc4fbf7f512 [1.6 Microsoft® Excel® 2013 / Microsoft® Excel® 2013] (PDF.js: 4.7.18 [9735a840a]) [viewer.mjs:12424:13](resource://pdf.js/web/viewer.mjs)
Warning: Digital signatures validation is not supported [viewer.mjs:12447:15](resource://pdf.js/web/viewer.mjs)
Request for font "Gill Sans MT" blocked at visibility level 2 (requires 3)
[pdf.mjs:5365:24](resource://pdf.js/build/pdf.mjs)
Warning: Cannot load system font: GillSansMT-Bold, installing it could help to improve PDF rendering. [pdf.mjs:391:13](resource://pdf.js/build/pdf.mjs)
Warning: Cannot load system font: GillSansMT, installing it could help to improve PDF rendering. [pdf.mjs:391:13](resource://pdf.js/build/pdf.mjs)
Warning: Unimplemented border style: inset 36 [pdf.mjs:391:13](resource://pdf.js/build/pdf.mjs)
calixteman commented 1 month ago

Interesting bug. I don't have the warnings in the console because I have the fonts on my system, but the rendering is wrong because we're using the glyph ids instead of their unicode counter-part. So I think we shouldn't try a substitution when we only have the gids and directly fallback on Helvetica for example.

@Snuffleupagus wdyt ?