Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width()

kreier commented 4 months ago

The determination of a string width with pdf.get_string_width(string) depends on the language set for the shape engine (when used). But even after explicit setting the shape engine to a specific script and language with something like pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara") this setting can change. For example, when a string with latin characters is printed. The shape engine examines the first character and realizes the mismatch, and changes to latin text shaping. But when the next string is rendered, the string width is determined first with the old (now latin) setting and after that the shape engine determines the language (arabic in this case) and switches to this script and language. But the return value is based on the calculation with the wrong latin setting.

I discovered this bug in a document where both latin and non-latin strings are mixed, and sometimes the non-latin strings where misplaced. To visualize this behavior I have this example below

Minimal code

from fpdf import FPDF
fontname = ["NotoArabic.ttf"]
teststrings = ["الملوك", "الملوك", "test", "الملوك", "الملوك", "الملوك", "test", "الملوك", "test"]

def render_strings(teststrings):
    pdf.set_font('noto', size=24)
    pdf.set_draw_color(160)
    pdf.set_line_width(0.3)
    for string in teststrings:
        # pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara")
        pdf.set_x(110 - pdf.get_string_width(string))
        pdf.rect(pdf.get_x(), pdf.get_y()+2, pdf.get_string_width(string), 13, style="D")
        pdf.cell(h=17, text=string)
        pdf.ln()
    pdf.ln()

for typeface in fontname:
    pdf = FPDF(orientation="P", unit="mm", format="A4")
    pdf.add_page()
    pdf.c_margin = 0
    pdf.add_font("noto", style="", fname="../../fonts/" + typeface)
    pdf.set_text_shaping(use_shaping_engine=True, script="arab", language="ara")
    render_strings(teststrings)
    pdf.output("fpdf2_switch_language" + typeface + ".pdf")

The output looks like this:

Environment

Operating System: Mac OSX
Python version: 3.12.3
fpdf2 version used: git+https://github.com/py-pdf/fpdf2.git@fbbb3f701fd35abaff1cf0b04a8576fe45e204e2 (latest master)

kreier commented 4 months ago

Updated test: setting the font shape engine to the desired language and script every time before determining the string width solves this problem. At least for the moment. I added this line in the example code above and commented it out.

andersonhc commented 4 months ago

Thanks for reporting this issue @kreier, I will take a look as soon as possible

py-pdf / fpdf2

Determine string width before detecting the language of the string and applying this setting to the font shape engine leads to wrong get_string_width() #1231