sumatrapdfreader / sumatrapdf

SumatraPDF reader
http://www.sumatrapdfreader.org
GNU General Public License v3.0
13.55k stars 1.72k forks source link

Vietnamese characters are not displayed properly when reading epub #2752

Open bloom2406 opened 2 years ago

bloom2406 commented 2 years ago

Hello,

I have an epub in Vietnamese. When I read it on Calibre, the text is displayed properly: image

But when I read it on SumatraPDF, the text is not displayed correctly: image

In the past, this used to be caused by default font not compatible with Vietnamese, and it could be fixed by changing FontName in EbookUI section to Arial / Times New Roman. I'm not sure what caused the current issue though... Could you please look into this? I really hope I can use Sumatra for all my ebooks, not only the English ones :(

Thanks.

GitHubRulesOK commented 2 years ago

Newer SumatraPDF uses MuPDF (with its font support which is fairly stringent) It seems not to handle some UTF character glyphs such as diacritics consistently, depending on Fonts defined in the ebook choices. It thus needs samples of such cases tested in MuPDF and if proven not supported there, then raised for bug reporting with MuPDF, otherwise SumatraPDF may not be able to substitute a choice of font.

quyleanh commented 12 months ago

With the latest build 3.5.1 the above bug is fixed but there is still problem with rendering. Like this one.

image

As you can see the etc... will has incorrect format. Even when I hard code font inside epub file.

Please refer to this file.

DanGian.zip

GitHubRulesOK commented 12 months ago

This is a problem not just specific to SumatraPDF

Here I copy and pasted the section into MS Notepad and applied one Font OVERALL and we see the FONT metrics (shape of the character tile) gets distorted for those glyphs. image

This is the reason PDF files are very heavily reliant on correct embedded Fonts for styling characters.

Clearly the font used for display is not defining the proportions of non western characters well. Thus Style (Serif / Sans) weight and scale are not being applied correctly.

Exactly same text but a different font with global definitions such as MS Segoe UI image

quyleanh commented 12 months ago

@GitHubRulesOK thank you for checking

If you extract this epub file (it's not pdf file) with unzip command, then you can see that there is no hard code for font-family inside this file. So when you open this file in the system has font supporting full Unicode characters (Times New Roman, Arial...), there will be no problem.

I check with other epub reader app in same machine, it still can render without problem. Or you can try with this browser based tool. https://www.flowoss.com/

GitHubRulesOK commented 12 months ago

@quyleanh SumatraPDF is using MuPDF as its "engine" which in turn is reliant on other libraries for font support. The situation can vary daily as to what works for given combinations and for Arabic / Hebrew there seem to have been changes (good or bad) today.

Some of those changes may or may not have an effect on other languages so issue is open until significant render improvements may be seen with your locality characters.

quyleanh commented 12 months ago

@GitHubRulesOK I see. So let's wait until then.