sumatrapdfreader / sumatrapdf

SumatraPDF reader
http://www.sumatrapdfreader.org
GNU General Public License v3.0
13.55k stars 1.72k forks source link

SumatraPDF showing garbled texts in this file #1585

Closed 775405984 closed 4 years ago

775405984 commented 4 years ago

123.pdf Screenshot_1 This file was written in Chinese, it also contains some English words without any problem. Chrome and Foxit doesn't have this issue. Interestingly, Microsoft Edge has the same issue. @kjk @GitHubRulesOK

775405984 commented 4 years ago

Screenshot_1 This is what it should looks like.

GitHubRulesOK commented 4 years ago

My first test in such cases is, can it be viewed differently in MuPDF (no) what about Adobe Acrobat 9 (e.g. is it a valid PDF) if not why not image image Could it have been made a more universally acceptable PDF ? image image

Conclusion the poor definition of fonts is beyond Edge, MuPDF or Acrobat abilities to substitute fonts The Author should have tested the file for embedding in the editor before release 1585 123ms.pdf

SumatraPeter commented 4 years ago

Interestingly, Microsoft Edge has the same issue.

Clearly you're referring to the legacy version, because if it looks fine in Chrome then it'll look fine in the new ChrEdge too.

GitHubRulesOK commented 4 years ago

I think you'l l find the OP file as supplied also views bad in new edge IF it was good in chrome perhaps its Chinese quality mollybendymum

My Edge fails and thus my conclusion is "BAD file" as this bad example can't be seen in this Windows default PDF enabled browser or Acrobat Reader (or SumatraPDF) Testing via validator (https://www.pdf-online.com/osa/validate.aspx) File 123.pdf Result : Document does not conform to PDF/A.

Details : Validating file "123.pdf" for conformance level pdf1.4 The embedded font program 'DY1+ZEKHsO-1' cannot be read. The embedded font program 'DY2+ZEKHsQ-2' cannot be read. The embedded font program 'DY3+ZEKHsQ-3' cannot be read. The document does not conform to the requested standard. The document contains fonts without embedded font programs or encoding information (CMAPs). The document does not conform to the PDF 1.4 standard. Done.

e.g. its NOT a valid PDF

SumatraPeter commented 4 years ago

I think you'l l find it also views bad in new edge

@GitHubRulesOK: Unless I'm completely mistaken, looks fine here to my admittedly clueless-about-Chinese eyes:

image

Sure the substituted font is too fat and blocky, but at least ChrEdge seems to have selected something appropriate, which is far more than can be said for Sumatra (and even Firefox's pdf.js for that matter).

The file may very well be bad, but clearly some programs (Chrome/ChrEdge, Foxit etc.) are able to handle it way better than others. Perhaps the MuPDF devs can think about improving their font substitution routines for such files. After all, the average end user is only concerned about whether he's able to view a file properly or not rather than the technical aspects of failure (or how to fix the same).

GitHubRulesOK commented 4 years ago

Foxit is global and has good Chinese capability (the mobile version for iPhone was transmitting unencrypted telemetry and other data to remote servers located in China) thus its the western editor of choice for this type of file. Chrome is globally tested, so I would hope it can handle files well in each locale. but I dont use it because its constantly monitoring my actions. That's great for some chrome users, however for the rest of the world as (far as) I know it "About Edge logo Microsoft Edge Microsoft Edge is up to date. Version 81.0.416.72 (Official build) (64-bit)"

This browser is made possible by the Chromium open source project and other open source software. Microsoft Edge © 2020 Microsoft Corporation. All rights reserved.

Unless I'm completely mistaken, looks fine here to my admittedly clueless-about-Chinese eyes:

That's odd I don't have any settings to say ignore Chinese however none of the bad Chinese glyphs show (there is no valid global font to substitute) and even the England ones that do work are not the best choice for readability. It would have been better if MuPDF had also binned the blank characters rather than inserting the pirates rubbish when 2001 article was cloned in 2018 image

Since Edge OR SumatraPDF CAN read current PDFv1.7 Aspose copies of Chin J Clin Obstet Gynecol (here is one from March 2020) but of those few randomly tested only 50% could download OR READ ONLINE directly without problems.

image

So in SumatraPDF go to open file and enter a direct link (for me it takes some time "like a slow boat from China") it should work http://www.obgyncn.com/EN/article/downloadArticleFile.do?attachType=PDF&id=8788 It is just some other copies of unknown / dubious origin that appear to be corrupt perhaps because the official copies cost about $27.50 via online outlets.

GitHubRulesOK commented 4 years ago

@775405984 Matt I would send that file back and ask for a refund of your $27.50 since the PDF is clearly not fit for purpose and nor is it the quality one can expect from their current suppliers. PS my fee invoice for £27.50 will follow when you confirm you got your $27.50 back 🙂

775405984 commented 4 years ago

PS my fee invoice for £27.50 will follow when you confirm you got your $27.50 back 🙂

Why would you spend money on that?It was sent to me by a student at the OB/GYN forum in China. I don't think they'll give me money since I Pirated the file. I guess I should consider myself lucky they didn't sue me. Thanks for the good detective work. @GitHubRulesOK and @SumatraPeter

VictorVG commented 4 years ago

@775405984

Also, Tracker Software PDF Editor for 123.pdf find another error:

Error - a displaced object was detected in the document.

GitHubRulesOK commented 4 years ago

@VictorVG interesting there is much more than

"a" displaced object

I see rubbish (it tries to use Arial WinAnsiEncoding for ËÎÌå) and

image

The only "Chinese" I can see is the "Wanfang Database" watermark on each page

VictorVG commented 4 years ago

It was I who looked at a friend in the printing house and reproduced what his package said - I do not have such tools for verification since I use either SumatraPDF under WINE (BSD UNIX) or KPDF (KDE UNIX) more often.

GitHubRulesOK commented 4 years ago

Closing as duplicate of several other cases such as Reader is unable to show cyrillic text in pdf file #378 Where files may only be viewed easily within a locality or machine that has a specific font available However the PDF is not defined using globally available fonts or the font glyphs have not been named/embedded in a rational way, thus breaking the "Portable"DF standard requirements.

Note most of these examples can be easily fixed in the editor to be made acceptable to all viewers