Latin-1 encoding errors

jwaldorf commented 7 years ago

I'm having an odd issue related to encoding. Note this is not related to issue #86 as I'm not getting any errors messages. When I use the cell function it doesn't print latin-1 characters properly on the pdf. I'm using this for my job and they're using python 2.6. The only way I can get the characters to print properly is if I set the python script to utf-8 and send the text as a unicode string the function. Here's a few examples I've tried:

Using latin1 encoding with a regular string # -*- coding: latin-1 -*- #USING LATIN-1 from fpdf import FPDF pdf = FPDF() pdf.add_page() pdf.set_font('arial') text = 'Ô' #REGULAR NON-UNICODE STRING pdf.cell(10,10,text,0,1) pdf.output('out.pdf', 'F') This displays: Ã”

Using latin1 encoding with a unicode string # -*- coding: latin-1 -*- #USING LATIN-1 from fpdf import FPDF pdf = FPDF() pdf.add_page() pdf.set_font('arial') text = u'Ô' #UNICODE STRING
pdf.cell(10,10,text,0,1) pdf.output('out.pdf', 'F') This also displays: Ã”

Using utf-8 encoding with a regular string # -*- coding: utf-8 -*- #USING UTF-8 from fpdf import FPDF pdf = FPDF() pdf.add_page() pdf.set_font('arial') text = 'Ô' #UNICODE STRING pdf.cell(10,10,text,0,1) pdf.output('out.pdf', 'F') This also displays: Ã”

Using utf-8 encoding with a unicode string # -*- coding: utf-8 -*- #USING UTF-8 from fpdf import FPDF pdf = FPDF() pdf.add_page() pdf.set_font('arial') text = u'Ô' #UNICODE STRING pdf.cell(10,10,text,0,1) pdf.output('out.pdf', 'F') This one works and finally displays: Ô

I have not installed any additional fonts to support utf-8 as part of the fpdf library, but setting my python script to that encoding is the only way to get the characters to print correctly. Can anyone help me figure out why?

alexanderankin commented 7 years ago

same results with \x style escapes? i mean ultimately, the data sources arent string literals in your python source code...

jwaldorf commented 7 years ago

\x style escapes ended up working, so here's my next question. My original post was a simple example. The larger picture is my program ingests an XML file and uses the data to create a pdf file. The XML is encoded as iso-8859-1 (latin1). Any idea on how to convert the special characters to their \x styel equivalents?

jwaldorf commented 7 years ago

Here's a new simple example that demonstrates what's happening. XML FILE (tmp.xml): <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?> <ROOT> <revision>¢©</revision> </ROOT>

PYTHON script: # -*- coding: latin-1 -*- from fpdf import FPDF import xml.etree.ElementTree as ET root = ET.parse('tmp.xml') revision = root.find('revision').text for c in revision: print ord(c)

pdf = FPDF() pdf.add_page() pdf.set_font('arial') pdf.cell(10,10,revision,0,1) pdf.output('out.pdf', 'F')

This outputs 194 162 194 169 So for some reason when python reads in the data it's putting 194 (0xC2) in front of my latin characters.

jwaldorf commented 7 years ago

Alright, I figured out the problem. My input XML appears not to be encoded properly. It looks like even though the encoding line at the top of the file says iso-8859-1, whatever editor the people creating that file for me is still saving it as utf-8. If I set my VIM editor to write the xml as iso-8859-1 then everything works.

reingart / pyfpdf

Latin-1 encoding errors #89