py-pdf / fpdf2

Simple PDF generation for Python
https://py-pdf.github.io/fpdf2/
GNU Lesser General Public License v3.0
993 stars 227 forks source link

{nb} breaks if text shaping is turned on with certain fonts #1090

Open catsclaw opened 5 months ago

catsclaw commented 5 months ago

The special {nb} code fails with some fonts when text shaping is turned on.

Minimal code Please include some minimal Python code reproducing your issue:

pdf = FPDF(format='letter')
pdf.add_font('gentium', style='', fname='GenBkBasR.ttf')
pdf.add_page()
pdf.set_font('gentium', '', 24)
pdf.set_text_shaping(True)
pdf.write(text='Pages {nb}')
pdf.ln()
pdf.set_text_shaping(False)
pdf.write(text='Pages {nb}')
pdf.output('test.pdf')

Result image Environment Please provide the following information:

Lucas-C commented 5 months ago

Thank you for the report @catsclaw

You are right, those two features are currently incompatible.

The reason is that with test shaping, each character is rendered individually in the PDF (with a dedicated Tj operator for each letter). But in FPDF._substitute_page_number() we look for the sequence {nb} to be present inside a single "PDF string" (rendered by a single Tj operator).

As a consequence, this is currently a limitation in fpdf2. We should mention it in our documention (in docs/PageBreaks.md). And PRs are welcome to implement this feature also when text shaping is enabled!

Would you be interested to contribute regarding this @catsclaw? (docs improvement and/or implementation)

andersonhc commented 5 months ago

The characters will be rendered as a sequence if they are only moving on the x axis by the character length, but if there is any offset (kerning, etc) we need to adjust the text matrix and make individual Tj. That's why only some fonts will have this problem.

andersonhc commented 5 months ago

Adding to this issue:

Example:

from fpdf import FPDF
text="Lorem ipsum dolor sit amet, {nb} {nb} {nb} {nb} {nb} {nb} consectetur adipiscing elit. {nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}{nb}Mauris sit amet lacus ut ex tincidunt vulputate non nec mauris. Lorem ipsum dolor sit amet, consectetur adipiscing elit."
pdf = FPDF()
pdf.add_page()
pdf.set_font("helvetica", "", 24)
pdf.multi_cell(w=pdf.epw, text=text, align="J", new_x="LEFT")
pdf.output('test_nb.pdf')

Result:

issue_nb

The problem is the replacement is done directly in the page content after all the rendering is done. I don't see an obvious way to fix it and it will probably demand a lot of rework on how output works.

gmischler commented 4 months ago

The underlying problem here is that an otherwise legitimate sequence of text characters is given a special meaning under certain circumstances. This was bound to result in conflicts somewhere down the line.

The clean solution would be to use a reserved Unicode character for this purpose, which can't otherwise appear in renderable text. A practical approach might be to convert self.str_alias_nb_pages into a special Glyph() subtype (say NbGlyph()) during text parsing. When rendering, NbGlyph() then inserts a sequence of three or four of this reserved Unicode character. And before writing the file, the reserved character sequences get replaced with the right sequence of digit glyphs.

Or am I missing some basic obstacle here? Yes, various places in the code need to learn about this special case, but that is kind of inevitable if we want to avoid conflicts.

Lucas-C commented 1 month ago

Just reporting @andersonhc comment there from https://github.com/py-pdf/fpdf2/issues/71#issuecomment-2123516795:

I am going to work on this issue soon and we will hopefully have a final solution before the next release.

In the meantime I can suggest 2 workarounds:

  • Using a different font on your footer
  • Using a different page number alias - i.e. calling self.alias_nb_pages(alias="####") and writing f"{self.page_no()}/####" on your trailer
smitelli commented 2 weeks ago

Would it be possible in the meantime to (loudly) call this out in the docs for both set_text_shaping() and alias_nb_pages()? I just spent quite a while chasing down the reason for {nb} failing to replace in unpredictable ways.

The underlying cause makes perfect sense in retrospect now that I understand what's happening, but in the moment, man that was a weird problem to chase down. :sweat_smile:

Lucas-C commented 2 weeks ago

Would it be possible in the meantime to (loudly) call this out in the docs for both set_text_shaping() and alias_nb_pages()?

Yes, I think that is a good idea 🙂 Would you like to submit a PR?