py-pdf / fpdf2

Simple PDF generation for Python
https://py-pdf.github.io/fpdf2/
GNU Lesser General Public License v3.0
1.12k stars 254 forks source link

Unicode detection on system fonts #954

Closed punnerud closed 1 year ago

punnerud commented 1 year ago

Error details

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/fpdf/fpdf.py in normalize_text(self, text)
   4258             try:
-> 4259                 return text.encode(self.core_fonts_encoding).decode("latin-1")
   4260             except UnicodeEncodeError as error:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 13-23: ordinal not in range(256)

The above exception was the direct cause of the following exception:

FPDFUnicodeEncodingException              Traceback (most recent call last)
/var/folders/s6/m3khz6k55rs4nqcjndqlwk8r0000gn/T/ipykernel_28743/155699049.py in <module>
      6 #pdf.add_font('Courier2', '', 'Courier2.ttf')
      7 #pdf.set_font('Courier2', '',8)
----> 8 pdf.cell(60, 5, 'hello world2 ┘└┌┐├┼┬┤│─┴', new_x="LMARGIN", new_y="NEXT", align='C')
      9 pdf.output("hello_world2.pdf")

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/fpdf/fpdf.py in wrapper(self, *args, **kwargs)
    220         if not self.page and not (kwargs.get("dry_run") or kwargs.get("split_only")):
    221             raise FPDFException("No page open, you need to call add_page() first")
--> 222         return fn(self, *args, **kwargs)
    223 
    224     return wrapper

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/fpdf/deprecation.py in wrapper(self, *args, **kwargs)
     30                 stacklevel=get_stack_level(),
     31             )
---> 32         return fn(self, *args, **kwargs)
     33 
     34     return wrapper

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/fpdf/fpdf.py in cell(self, w, h, text, border, ln, align, fill, link, center, markdown, new_x, new_y)
   2783             )
   2784         # Font styles preloading must be performed before any call to FPDF.get_string_width:
-> 2785         text = self.normalize_text(text)
   2786         styled_txt_frags = self._preload_font_styles(text, markdown)
   2787         return self._render_styled_text_line(

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/fpdf/fpdf.py in normalize_text(self, text)
   4259                 return text.encode(self.core_fonts_encoding).decode("latin-1")
   4260             except UnicodeEncodeError as error:
-> 4261                 raise FPDFUnicodeEncodingException(
   4262                     text_index=error.start,
   4263                     character=text[error.start],

FPDFUnicodeEncodingException: Character "┘" at index 13 in text is outside the range of characters supported by the font used: "courier". Please consider using a Unicode font.

Minimal code

from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('Courier', size=8)
#pdf.add_font('Courier2', '', 'Courier.ttf')
#pdf.set_font('Courier2', '',8)
pdf.cell(60, 5, 'hello world2 ┘└┌┐├┼┬┤│─┴', new_x="LMARGIN", new_y="NEXT", align='C')
pdf.output("hello_world2.pdf")

Environment Please provide the following information:

The font Courier on Mac containes the characters ┘└┌┐├┼┬┤│─┴. If I add Courier manually it works:

pdf.add_font('Courier2', '', 'Courier2.ttf')
pdf.set_font('Courier2', '',8)
andersonhc commented 1 year ago

Hi @punnerud When you don't add a font, you are not using the system font Courier - you are in fact using the standard PDF font courier. The PDF specification has 14 standard fonts all readers must support and they have the basic 256 ascii characters only.

When you add a TTF font, fpdf allows you to use any glyph available on the font and it will be embedded into the resulting PDF file.

This is not a bug in fpdf2, but a limitation of the PDF standard.

Lucas-C commented 1 year ago

Does @andersonhc answer clarifies the situation for you @punnerud? 🙂

afriedman412 commented 1 year ago

regardless, the idea that the system font Courier and the standard PDF font Courier are distinct is confusing -- maybe we add a note in the docs and a link to the error message?

gmischler commented 1 year ago

The peculiarities of the PDF format can indeed be confusing... :wink:

I can include a notice to #975. There's no extra page in the docs talking specifically about font issues, so that seems like a good place.

gmischler commented 1 year ago

975 now includes an explanation of how built-in and Unicode fonts differ from each other.