Some combination glyphs in Khmer return inconsistent width values

kreier commented 5 months ago

Defect Report

I use NotoSansKhmer and uharfbuzz together with fpdf2 to create a pdf document. To right align the text I need the width of a string after being adjusted by the font shaping engine for combined characters. With all Noto fonts (Sans, Serif, SansUI) I get for some glyphs values that are too small. Therefore the right alignment is shifted by several pixel. A demonstration is attached below. Switching to Google font Khmer-Regular.ttf solves the problem.

Title

Some combination glyphs in Khmer return inconsistent width values

Font

NotoSerifKhmer-Regular.ttf
NotoSansKhmer-Regular.ttf
NotoSansKhmerUI-Regular.ttf

Where the font came from, and when

Site: https://notofonts.github.io/khmer/ Site: https://notofonts.github.io/khmer/fonts/NotoSansKhmer/googlefonts/ttf/NotoSansKhmer-Regular.ttf Date: 2024-06-06

Font Version

2.004

OS name and version

Windows 11 Pro 23H2

Application name and version

fpdf2 2.7.9 with uharfbuzz 0.39.1

Issue

As written in the introduction, the returned width of a string after font shaping with harfbuzz is too small for several combinations with Noto fonts. The python program will produce the observed results for a variety of fonts, but is consistently flawed for Noto fonts while the Google font solves the problem.

I draw a box with the width of the drawn string as returned as pdf.string_width() function. See the comparison in screenshots below. The python program is:

from fpdf import FPDF
fontname = ["NotoKhmer.ttf", "Khmer-Regular.ttf", 
            "NotoSansKhmer-Regular.ttf", "NotoSansKhmer-Regular.otf", "NotoSansKhmerUI-Regular.ttf", 
            "NotoSansKhmerUI-Regular.otf", "NotoSansKhmerUI-Regular1.ttf", "NotoSansKhmerUI-Regular1.otf",
            "NotoSansKhmer-RegularDev.ttf", "NotoSerifKhmer-Regular.ttf"]
# strings:    years,  alone,    1st year, sword,  fork
teststrings = ["ឆ្នាំ", "ម្នាក់ ឯង", "ឆ្នាំទី១", "ដង្កាវ", "ង្គ្រា"]

def render_strings(teststrings):
    pdf.set_font('noto', size=24)
    pdf.set_draw_color(160)
    pdf.set_line_width(0.3)
    for string in teststrings:
        pdf.rect(pdf.get_x(), pdf.get_y()+2, pdf.get_string_width(string), 13, style="D")
        pdf.cell(h=17, text=string + " ")
    pdf.ln()

def info(text):
    pdf.set_font("Helvetica", size=12)
    pdf.cell(text=text)
    pdf.ln()

for typeface in fontname:
    pdf = FPDF(orientation="P", unit="mm", format="A4")
    pdf.add_page()
    pdf.c_margin = 0
    pdf.add_font("noto", style="", fname="../../fonts/" + typeface)
    info("Rendering without shape engine:")
    render_strings(teststrings)
    info("Now activating the shape engine and try this again:")
    pdf.set_text_shaping(use_shaping_engine=True, script="khmr", language="khm")
    render_strings(teststrings)
    render_strings([''.join(teststrings)])     
    pdf.output("fpdf2_stringwidth" + typeface + ".pdf")

Character data

One example is years: ឆ្នាំ or U+1786, U+17D2, U+1789, U+17B6, U+17C6. It is the first string in my example above.

Screenshot

This is the result of all Noto fonts (glyphs are slightly different, of course) but the box is consistently too small:

Problem solved with using another font: https://fonts.google.com/specimen/Khmer

Tools for reporting bugs

Harfbuzz hb-view and hb-shape

These are part of the HarfBuzz distribution and can help isolate if an issue is in the app/OS, shaping engine, or font.

hb-view renders the text with the exact font (for example, to see how ligatured characters shape) using your installed version of HarfBuzz.

For example:

  hb-view --font-file {path to font} --text-file {path to text file} --output-file '{sample}.png'

hb-shape shows glyph selection and positioning

Fontview

Fontview displays the text.

Fontdiff

Fontdiff displays the text using two versions of the font side by side.

kuth-chi commented 5 months ago

Here is reference of Khmer Unicode scripts ISO 15924 Khmer Unicode

Some of software required to define each characters in binaries and decode to output. using REGX. But I am not good at NLP.

simoncozens commented 5 months ago

I can understand why this seems like an issue with the font, because you have tried a different font and it works. However, Dan Hong's Khmer font is constructed differently and does not use mark attachment; and a problem which occurs when using a font with an open source PDF library - and nobody has reported it in general - is almost always caused by the PDF library.

I believe that the problem here is that FPDF is not correctly accounting for the advance width of mark attached glyphs. I would be looking carefully at the implementation of get_string_width.

simoncozens commented 5 months ago

Closing as this is now fixed in the PDF library.

notofonts / notofonts.github.io