Closed jwaldorf closed 7 years ago
same results with \x style escapes? i mean ultimately, the data sources arent string literals in your python source code...
\x style escapes ended up working, so here's my next question. My original post was a simple example. The larger picture is my program ingests an XML file and uses the data to create a pdf file. The XML is encoded as iso-8859-1 (latin1). Any idea on how to convert the special characters to their \x styel equivalents?
Here's a new simple example that demonstrates what's happening.
XML FILE (tmp.xml):
<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<ROOT>
<revision>¢©</revision>
</ROOT>
PYTHON script:
# -*- coding: latin-1 -*-
from fpdf import FPDF
import xml.etree.ElementTree as ET
root = ET.parse('tmp.xml')
revision = root.find('revision').text
for c in revision:
print ord(c)
pdf = FPDF()
pdf.add_page()
pdf.set_font('arial')
pdf.cell(10,10,revision,0,1)
pdf.output('out.pdf', 'F')
This outputs 194 162 194 169 So for some reason when python reads in the data it's putting 194 (0xC2) in front of my latin characters.
Alright, I figured out the problem. My input XML appears not to be encoded properly. It looks like even though the encoding line at the top of the file says iso-8859-1, whatever editor the people creating that file for me is still saving it as utf-8. If I set my VIM editor to write the xml as iso-8859-1 then everything works.
I'm having an odd issue related to encoding. Note this is not related to issue #86 as I'm not getting any errors messages. When I use the
cell
function it doesn't print latin-1 characters properly on the pdf. I'm using this for my job and they're using python 2.6. The only way I can get the characters to print properly is if I set the python script to utf-8 and send the text as a unicode string the function. Here's a few examples I've tried:Using latin1 encoding with a regular string
# -*- coding: latin-1 -*-
#USING LATIN-1
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('arial')
text = 'Ô' #REGULAR NON-UNICODE STRING
pdf.cell(10,10,text,0,1)
pdf.output('out.pdf', 'F')
This displays: ÔUsing latin1 encoding with a unicode string
# -*- coding: latin-1 -*-
#USING LATIN-1
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('arial')
text = u'Ô' #UNICODE STRING
pdf.cell(10,10,text,0,1)
pdf.output('out.pdf', 'F')
This also displays: ÔUsing utf-8 encoding with a regular string
# -*- coding: utf-8 -*-
#USING UTF-8
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('arial')
text = 'Ô' #UNICODE STRING
pdf.cell(10,10,text,0,1)
pdf.output('out.pdf', 'F')
This also displays: ÔUsing utf-8 encoding with a unicode string
# -*- coding: utf-8 -*-
#USING UTF-8
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font('arial')
text = u'Ô' #UNICODE STRING
pdf.cell(10,10,text,0,1)
pdf.output('out.pdf', 'F')
This one works and finally displays: ÔI have not installed any additional fonts to support utf-8 as part of the fpdf library, but setting my python script to that encoding is the only way to get the characters to print correctly. Can anyone help me figure out why?