Open sven-oly opened 7 years ago
try checking out the tests for a whole bunch of weird characters, or hello world in many languages.
maybe you could share a snippet of some code?
I'm seeing this as well:
UnicodeEncodeError 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
The exception is raised in the cell
function:
pdf.cell(txt=name)
# where name is José
Without the "é" works fine, of course:
pdf.cell(txt=name)
# where name is Jose
The django debug error page shows the name
variable as:
u'Jos\xe9'
I can get around the problem but doing the following:
import unicodedata
def unicode_normalize(s):
return unicodedata.normalize('NFKD', s).encode('ascii', 'ignore')
pdf.cell(txt=unicode_normalize(name)) # where name is José
# The approximate ascii "Jose" is printed on the PDF
Not ideal, as we're losing accents, but at least (a) it doesn't crash, and (b) we see something resembling the string we want.
What is The Right Way to handle this?
Edit: By the way, if I use name.encode('utf-8'), no exception is raised, but "José" is printed on the pdf.
seems to work fine for me on version 2? lmk if this version works.
Ah I'm using 1.7.2. When will 2.0.0 be available on pypi?
check link
Oh so i just looked at the file produced and it doesnt look like the character is actually making it through, I'll make some test cases and see how this works. I haven't spent on this project in a while, so while that means its easy for me to switch gears from ttf and image management, im also a bit busy with other things.
epalm, Currently we accept string for py3 and string/unicode for py2 version. It shouldn't 'eat' utf-8 encoded sequences. Can you provide striped down version of this problem.
There are only roadmap to massive update. Current policy is full compatibility.
I'm not exactly sure how to reproduce this. When the variable comes from my database (postgres) and application (django), I get the above UnicodeEncodeError
.
When I do it in a shell, I get an AttributeError
:
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import fpdf
>>> fpdf.FPDF_VERSION
'1.7.2'
>>> from fpdf import FPDF
>>> pdf = FPDF(format='letter')
>>> pdf.add_page()
>>> pdf.cell(0, txt='José') # same result with u'José' if that matters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\project\lib\site-packages\fpdf\fpdf.py", line 150, in wrapper
return fn(self, *args, **kwargs)
File "C:\project\lib\site-packages\fpdf\fpdf.py", line 685, in cell
txt = self.normalize_text(txt)
File "C:\project\lib\site-packages\fpdf\fpdf.py", line 1099, in normalize_text
if self.unifontsubset and isinstance(txt, str) and not PY3K:
AttributeError: 'FPDF' object has no attribute 'unifontsubset'
>>>
Well, this is programming error. We should add more descriptive error though. unifontsubset is not assigned until set_font is used.
Would it be a good idea to initialize some defaults as part of constructor?
On Jun 16, 2017 2:08 PM, "Roman Kharin" notifications@github.com wrote:
Well, this is programming error. We should add more descriptive error though. unifontsubset is not assigned until set_font is used.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/reingart/pyfpdf/issues/86#issuecomment-309095776, or mute the thread https://github.com/notifications/unsubscribe-auth/AIgjJMpW5GPxjbh1m8kq-O4AauYwp7ULks5sEsSngaJpZM4NFP6F .
Oops, sorry, forgot to initialize a font.
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import fpdf
>>> fpdf.FPDF_VERSION
'1.7.2'
>>> from fpdf import FPDF
>>> pdf = FPDF(format='letter')
>>> pdf.add_page()
>>> pdf.set_font('Arial', 'B', 14)
>>> pdf.cell(0, txt=u'José')
>>> pdf.output(name='file.pdf')
produces (this is expected):
However If I use 'José'
instead of u'José'
I get:
I'm still not sure how to trigger UnicodeEncodeError
from a shell.
Ok, Note to all. This is example of good report. In between let me reread pdf_reference_1-7.pdf
Eric, It seems to be due encoding issue Linux, 2.7, utf-8
>>> import sys
>>> >>> sys.stdout.encoding
'UTF-8'
>>> repr(u'José')
"u'Jos\\xe9'"
>>> repr('José')
"'Jos\\xc3\\xa9'"
Have no win access for now, but can you test same?
Sure:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> repr(u'José')
"u'Jos\\xe9'"
>>> repr('José')
"'Jos\\x82'"
>>>
That is You assume send bytes in cp437 (é is 0x82, unicode 0x00E9), but pyfpdf translate them as win-1251 (default) and 0x82 is ',' (Unicode 0x201A). We already have to add setting for this case pdf.set_doc_option("core_fonts_encoding", 'cp437'). But it's really simple to use u"" (sounds lazy but it's 2017 so py3 maybe) and do not forget that final code page for non-unicode font is ''WinAnsiEncoding" i.e. has no some diacritics.
Sorry, I'm not sure what your conclusion is. I'm using python 2 on ubuntu 14.04 in production. Should I be calling pdf.set_doc_option
in my code, with params that depend on my environment?
We found why code works this way from console. But still has too little clues about you production environment. Currently i can recreate this variant: Django or some middleware return 'José' as utf-8 bytes, then .encode("latin-1") in pyfpdf give this error. Is this correct?
I am not really sure whether this is related. I started facing an error in this line
p = self.pages[n].encode("latin1") if PY3K else self.pages[n]
The error was because I was trying to insert a €
sign. I didn't search through the code to figure out why utf-8 was not used here. My solution was to replace latin1
with windows-1252
.
Latin-1 is basically equivalent to ascii I guess. Any character above the usual 7bit (128) starts to throw errors. windows-1252 is a little better with support till 159. There was no other workaround in my source code to support this character. Let me know if there were any.
I do not think this would break anything in fPDF. So can anyone add it to source?
@openskullbox I've tried your soluition but I've encountered some issues. The only substitution of the latin1
with windows-1252
produce an error of decoding:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 28: character maps to undefined
Did someone has a solution to this problem? I'd like to print some greek characters
call set_font after adding
pdf.add_font('kalpurush', '', "C:\Windows\Fonts\kalpurush.ttf", uni=True)
pdf.set_font('kalpurush', '', 14)
Hello, have the encoding errors been fixed for python 3.7+ ? I am using the most recent, fpdf 2.0.3. At least I think that is the most recent. And I keep getting errors with characters like the unicode dash (\u2013)
So I have not really been able to get any unicode tests written because it is unclear and undocumented (aside from official font standards/adobe PDF documentation) what the cmap is, what it should be assigned to, nor have i been able to find any font libraries for python which have intuitive docs or address this use case.
On Wed, Feb 5, 2020 at 11:43 AM jpenaloza1211 notifications@github.com wrote:
Hello, have the encoding errors been fixed for python 3.7+ ? I am using the most recent, fpdf 2.0.3. At least I think that is the most recent. And I keep getting errors with characters like the unicode dash (\u2013)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/reingart/pyfpdf/issues/86?email_source=notifications&email_token=ACECGJAJ4HVQ2OXZUYAOO73RBLUE3A5CNFSM4DIU72C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4DV4Q#issuecomment-582499058, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACECGJFA5UYF2J2MYFLMOPDRBLUE3ANCNFSM4DIU72CQ .
Solved it as shown by @sajjadafridi
Downloaded Arial Unicode Regual font. And declared this :
pdf.add_font('ArialUnicode',fname='Arial-Unicode-Regular.ttf',uni=True)
pdf.set_font('ArialUnicode', '', 11)
@ohidurbappy where did you find the file for arial unicode regular? I can only find the font for the MS version and it doesnt seem to work.
@Zawszy I can't remember the link. Just attaching the file, in-case you need it.
Hi! There has been a recent change on using set_doc_option. It was deprecated a few days ago. You can check the release notes here: [https://github.com/PyFPDF/fpdf2/releases]
Now, without the set_doc_option, it says "the FPDF.set_doc_option() method is deprecated in favour of just setting the core_fonts_encoding property on an instance of FPDF
."
I'm not sure what it means to an instance of FPDF but when I did the following: pdf = FPDF('P', 'mm', 'Legal').core_fonts_encoding('utf-8')
it produced an error: AttributeError: 'FPDF' object has no attribute 'core_fonts_encoding'
Am I doing it wrong? Thanks, appreciate who can help.
Have the same issue
file "pdf_report.py", line 13, in <module>
pdf = FPDF(orientation = 'L', unit = 'mm', format = 'A4').core_fonts_encoding('utf-8')
AttributeError: 'FPDF' object has no attribute 'core_fonts_encoding'
@pampam07 @vade PyFPDF is not maintained anymore, you may want to check PyFPDF/fpdf2 as a successor, with a 99%-compatible API
Ah good to know. Thanks. FWIW I realized I wasnt defining / setting a font - doing that appears to have solved my issue(s)
Solved it as shown by @sajjadafridi Downloaded Arial Unicode Regual font. And declared this :
pdf.add_font('ArialUnicode',fname='Arial-Unicode-Regular.ttf',uni=True) pdf.set_font('ArialUnicode', '', 11)
Thank you
@Lucas-C does fpdf2 solve the encoding issue mentioned in this thread?
I'm pretty sure yes 😊
The following works well with a source code file encoded as utf8:
#!/usr/bin/env python3
import fpdf
pdf = fpdf.FPDF()
pdf.add_page()
pdf.set_font("Helvetica", size=15)
pdf.cell(txt="José")
pdf.output("issue_86.pdf")
Same with the following, put in a source code file encoded as latin-1 ( ISO 8859-1):
#!/usr/bin/env python3
# -*- coding: latin-1 -*-
import fpdf
pdf = fpdf.FPDF()
pdf.add_page()
pdf.set_font("Helvetica", size=15)
pdf.cell(txt="José")
pdf.output("issue_86.pdf")
Lucas-C's code works great with fpdf2 until you try different Unicode characters. If you replace "José" with "Joséō" you get the error below..
import fpdf pdf = fpdf.FPDF() pdf.add_page() pdf.set_font("Helvetica", size=15) pdf.cell(txt="Joséō"") pdf.output("issue_86.pdf")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u014d' in position 4: ordinal not in range(256)
Is there a solution that works for arbitrary Unicode characters?
Adding the Arial Unicode font works in the mean time.
OK, thank you @cseberino. I reported the issue here for fpdf2: https://github.com/PyFPDF/fpdf2/issues/330
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201d' in position 10: ordinal not in range(256)