RTL text support - Githubissues

GoogleCodeExporter commented 9 years ago

Hi
Thank very much for your script ,i use it with english words very well but with 
my original language (Persian) i have problem,
I use pyfpdf with python3.2 and py3.3 and this error happend with this example :

my code :
import sys
from fpdf import FPDF
pdf = FPDF()
pdf.compress = False
pdf.add_page()
pdf.add_font('DejaVu', '', './fonts/DejaVuSansCondensed.ttf', uni=True)
pdf.set_font('DejaVu', '', 14)
text = """
English: Hello World
Greek: Γειά σου κόσμος
Polish: Witaj świecie
Portuguese: Olá mundo
Russian: Здравствуй, Мир
Vietnamese: Xin chào thế giới
Arabic: مرحبا العالم
Hebrew: שלום עולם
"""
pdf.write(8, '')
pdf.ln(20)

===========================

Error:

ali@pc-debian:~/Aptana_Studio_3/Workspace/cc3$ python3.2 pycairo.py 
Traceback (most recent call last):
  File "pycairo.py", line 17, in <module>
    pdf.add_font('DejaVu', '', './fonts/DejaVuSansCondensed.ttf', uni=True)
  File "/usr/local/lib/python3.2/dist-packages/fpdf/fpdf.py", line 437, in add_font
    ttf.getMetrics(ttffilename)
  File "/usr/local/lib/python3.2/dist-packages/fpdf/ttfonts.py", line 87, in getMetrics
    self.version = version = self.read_ulong()
  File "/usr/local/lib/python3.2/dist-packages/fpdf/ttfonts.py", line 158, in read_ulong
    return (ord(s[0])*16777216) + (ord(s[1])<<16) + (ord(s[2])<<8) + ord(s[3]) #     16777216  = 1<<24
TypeError: ord() expected string of length 1, but int found
ali@pc-debian:~/Aptana_Studio_3/Workspace/cc3$ python3.3 pycairo.py 
Traceback (most recent call last):
  File "pycairo.py", line 17, in <module>
    pdf.add_font('DejaVu', '', './fonts/DejaVuSansCondensed.ttf', uni=True)
  File "/usr/local/lib/python3.3/dist-packages/fpdf/fpdf.py", line 437, in add_font
    ttf.getMetrics(ttffilename)
  File "/usr/local/lib/python3.3/dist-packages/fpdf/ttfonts.py", line 87, in getMetrics
    self.version = version = self.read_ulong()
  File "/usr/local/lib/python3.3/dist-packages/fpdf/ttfonts.py", line 158, in read_ulong
    return (ord(s[0])*16777216) + (ord(s[1])<<16) + (ord(s[2])<<8) + ord(s[3]) #     16777216  = 1<<24
TypeError: ord() expected string of length 1, but int found

Original issue reported on code.google.com by alireza...@gmail.com on 10 Jul 2013 at 10:34

GoogleCodeExporter commented 9 years ago

alirezaimi, did you try py3k branch?

In default branch fonts are not well supported for 3k.

Original comment by romiq...@gmail.com on 10 Jul 2013 at 1:56

GoogleCodeExporter commented 9 years ago

I cloned source from https://code.google.com/p/pyfpdf/, 
what branch is good with py3k ??

and when i use header method with .png file this error that Not a PNG file 
happened and exit from app, And when using with jpg file something happened to 
image that not show any thing at all, just white area , corrupted my file 
physically ! What happened ?

Thanks.

Original comment by alireza...@gmail.com on 10 Jul 2013 at 2:06

GoogleCodeExporter commented 9 years ago

alirezaimi

To switch branch do this

% hg clone https://code.google.com/p/pyfpdf/ 
% cd pyfpdf
% hg update -C py3k
% 2to3 -f all -w -o fpdf_py3k -n fpdf

This eliminates all local changes, be aware test in same folder.

also use this wiki as starting point 
https://code.google.com/p/pyfpdf/wiki/Python3

Original comment by romiq...@gmail.com on 11 Jul 2013 at 5:20

GoogleCodeExporter commented 9 years ago

Thanks for support, problem with pyfpdf error solved and pyfpdf works, but 
problem is now with persian language and utf8 support, 
this is my code:
from fpdf import FPDF

    pdf = FPDF()
    pdf.compress = False
    pdf.add_page()
    pdf.add_font('DejaVu', '', './fonts/DejaVuSans.ttf', uni=True)
    pdf.set_font('DejaVu', '', 14)
    text= "این یک متن پارسی است . This is a Persian text !!"
    pdf.write(8, text)
    pdf.ln(8)
    pdf.output("unicode.pdf", 'F')

and the attached file is my output pdf that completely cluttered.

Thanks.

Original comment by alireza...@gmail.com on 11 Jul 2013 at 10:11

Attachments:

[Screenshot from 2013-07-11 14:38:36.png](https://storage.googleapis.com/google-code-attachments/pyfpdf/issue-60/comment-4/Screenshot from 2013-07-11 14:38:36.png)

GoogleCodeExporter commented 9 years ago

alirezaimi

Your test sample give same picture.

But, is this glyps somehow linked? I can't understand any of these.

Is glyphs mirrored and displayed from another set? So, it seems to be LTR text 
issue.

Original comment by romiq...@gmail.com on 15 Jul 2013 at 8:07

GoogleCodeExporter commented 9 years ago

no! the text in picture is my text but completely cluttered and i can not 
understand anything too!!, this is the text : 
"این یک متن پارسی است"
in Persian(Farsi) language and RTL 
direction.(http://en.wikipedia.org/wiki/Persian_language)
and attachment file is pyfpdf output.
I think problem is with UTF-8 support in pyfpdf.

Thanks.

Original comment by alireza...@gmail.com on 15 Jul 2013 at 8:17

Attachments:

[Screenshot from 2013-07-16 00:43:01.png](https://storage.googleapis.com/google-code-attachments/pyfpdf/issue-60/comment-6/Screenshot from 2013-07-16 00:43:01.png)

GoogleCodeExporter commented 9 years ago

Incorrect rendering for RTL text. Sure.

Need to do

Original comment by romiq...@gmail.com on 17 Jul 2013 at 6:54

Attachments:

persrtl.png

GoogleCodeExporter commented 9 years ago

Yes!!!
And Adherence Letter are not met, for example فارسی in this form showed : 
ف‌ا‌ر‌س‌ی and reversed word sort in this form : 
ی‌س‌ر‌ا‌ف
Thanks for support.plz when this problem solved update this post.

Thanks.

Original comment by alireza...@gmail.com on 17 Jul 2013 at 10:02

GoogleCodeExporter commented 9 years ago

Ok,

A py3k already pinned to Issue 13, then this will be RTL text support.

Original comment by romiq...@gmail.com on 22 Jul 2013 at 8:01

Changed title: RTL text support
Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

This is not really a unicode handling issue (it is working ok AFAIK)
The problem is how the text is displayed, because it is actually stored LTR as 
shown in the PDF.

TCPDF resolves this implementing the Bidirectional Algorithm 
(http://unicode.org/reports/tr9/) in the function TCPDF_FONTS::utf8Bidi

A tentative solution would be using PyBiDi: 
https://pypi.python.org/pypi/python-bidi (BiDi layout implementation)
My vote goes to this library as it is pure python and LGPLv3+ (there are other 
libraries for python like pyfribidi but requires compilation). 
Also, with minimal changes, it worked under python3!

Attached is a sample PDF (can you confirm it is ok?)
You can see the test code in:

https://code.google.com/p/pyfpdf/source/browse/tests/issue60.py

This could be implemented in normalize_text, UTF8ToUTF16BE or similar.

TCPDF also add specials methods like setRTL / getRTL and similar, that help 
with align, margin, etc (mirroring x axis and calculations).
We should implement them too, but it will quite some work.

Thoughts?

Original comment by reingart@gmail.com on 5 Feb 2014 at 2:56

Changed state: Accepted

Attachments:

issue60.pdf

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Hi, just little update
I converted this test to batch test (run all tests in bulk before commit)
PyBIDI did not updated since 2010 and require some fix for py3k

bidi/algorithm.py:
...
if sys.version_info >= (3, 0):
    unicode = str
...
X6_IGNORED = list(X2_X5_MAPPINGS.keys()) + ['BN', 'PDF', 'B']
X9_REMOVED = list(X2_X5_MAPPINGS.keys()) + ['BN', 'PDF']
...

Original comment by romiq...@gmail.com on 5 Feb 2014 at 6:42

pckiller2008 / pyfpdf

RTL text support #60