reingart / pyfpdf_googlecode

Automatically exported from code.google.com/p/pyfpdf
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Python 3 support #13

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
i'd be happy to help make this python3 happy.  i have a distinct need for 
generating PDFs and my knowledge about pdf structure etc is minimal at best but 
in trade, i write all my stuff for py3

i have a checkout copy of pyfpdf that currently runs for basic stuff.  i've 
made my own alterations purely to get it running so i can figure things out but 
i'd really like to clean up my changes and make it so it'll be happy in both 2 
and 3

-david

Original issue reported on code.google.com by firefigh...@gmail.com on 17 May 2011 at 6:35

GoogleCodeExporter commented 9 years ago
Hi David, sorry for the delay, I've missed completely this issue (I'm 
activating notifications to not miss an issue again)

If you want I can give you commit access so you can submit your code and then 
we can review/merge it.

Original comment by reingart@gmail.com on 3 Oct 2011 at 5:52

GoogleCodeExporter commented 9 years ago
basic and experimental py3k support was added at rev c1a331646f42 and up
currently only text (builtin fonts) wokrs.
embed ttffonts and image support are not ported yet.

https://code.google.com/p/pyfpdf/source/detail?r=f866a1306719bc5a5c9a0d1ea38f5f0
f4278d5a8

Original comment by reingart@gmail.com on 6 Aug 2012 at 8:12

GoogleCodeExporter commented 9 years ago
Hello, there are simple support for compression in py3k

--- a/fpdf/fpdf.py  Fri Aug 17 01:12:31 2012 -0300
+++ b/fpdf/fpdf.py  Wed Dec 19 11:45:40 2012 +0400
@@ -1072,7 +1072,10 @@
             self._out('endobj')
             #Page content
             if self.compress:
-                p = zlib.compress(self.pages[n])
+                if PY3K:
+                    p = 
zlib.compress(self.pages[n].encode("latin-1")).decode("latin-1")
+                else:
+                    p = zlib.compress(self.pages[n])
             else:
                 p = self.pages[n]
             self._newobj()

Original comment by romiq...@gmail.com on 19 Dec 2012 at 7:52

Attachments:

GoogleCodeExporter commented 9 years ago
Experimental unicode fonts support with Python3 (via 2to3)

Original comment by romiq...@gmail.com on 19 Dec 2012 at 1:01

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks romiq.kh!
Could you explain some of your changes?
Why are you converting everithing to unicode?

I don't know if we should manage all internal strings as unicode, or use bytes 
instead.
What do you think?

Is this working both in py2.x and py3k?
Did you test this with non latin1 characters? (there are some test available)

I'm not sure if self.pages[n].encode("latin-1") would break, or it just works 
by chance / coincidence 

I've made you commiter, if you want you can make a py3k branch to test it.

Original comment by reingart@gmail.com on 19 Dec 2012 at 6:49

GoogleCodeExporter commented 9 years ago
Thank for commiting rights

Main idea of patch set is "if we can't avoid huge unicode self._out handling in 
FPDF class - convert to bytes later"

In other case it need to be:
1. Create self._out as bytestring - use b"" literals (impossible for 2.x)
2. No int-to-str function (so decode, format, encode)

Nt sure if massive PY3K using is fine.

> Is this working both in py2.x and py3k?
It should, at least with py2.7. I can't test 2.4 and complain about // operator.

> Did you test this with non latin1 characters? (there are some test available)

unifonts.py works with some fonts, see attachment
btw, please fix HelloWorld.txt: "Russian: Здравствуй, мир"

> I'm not sure if self.pages[n].encode("latin-1") would break, or it just works 
by chance / coincidence 

No coincendace. Just dirty hack:
Pages array contains unicode strings. But there are charters [0..255].
I other side zlib.compress requires bytestring.
1. self.pages[n].encode("latin-1") - unicode -> bytes
2. zlib.compress() - bytes -> compressed bytes
3. .decode("latin-1") - bytes -> unicode

Last conversion unicode -> bytes made by 
            if PY3K:
                # TODO: proper unicode support
                f.write(self.buffer.encode("latin1"))
and this is not mine patch :)

Original comment by romiq...@gmail.com on 19 Dec 2012 at 7:34

Attachments:

GoogleCodeExporter commented 9 years ago
> Nt sure if massive PY3K using is fine.

I don't either, please create a function so we can refactor it on the future if 
we need

>> Is this working both in py2.x and py3k?
> It should, at least with py2.7. I can't test 2.4 and complain about // 
operator.

2.4 is too old, 2.5 compatibility would be great, 2.7 is mandatory

>> Did you test this with non latin1 characters? (there are some test available)
> unifonts.py works with some fonts, see attachment

can you download the unicode font pack and test all the fonts?
(to test this patch doesn't break anything)

> btw, please fix HelloWorld.txt: "Russian: Здравствуй, мир"

go ahead ;-)
sorry, I only speaks Spanish and a little of English, so that sure is a 
inaccurate google translation...

>> I'm not sure if self.pages[n].encode("latin-1") would break, or it just 
works by chance / coincidence 
> No coincendace. Just dirty hack:
> Pages array contains unicode strings. But there are charters [0..255].

Yes, it is a dirty hack, but it is the best we have so far ;-)

> Last conversion unicode -> bytes made by 
>            if PY3K:
>                # TODO: proper unicode support
>                f.write(self.buffer.encode("latin1"))
> and this is not mine patch :)

Yes, as long it doesn't break existing code, I think we can live with that :-)

Fell free to address this comments, commit and close this issue by now, so 
pyfpdf can finally be py3k compatible!

Original comment by reingart@gmail.com on 21 Dec 2012 at 2:31

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi all,

In py3k branch was successfully pass
                            3.2  2.7  
py3k.py (with compression)   +    -
unifonts.py                  +    +
ex_unicode                   +    +
html.py                      -    +
nb_pages.py                  +    +

to be continue...

Original comment by romiq...@gmail.com on 22 Dec 2012 at 5:51

GoogleCodeExporter commented 9 years ago
Hi, all.

During porting to python3 was found round issue:

Font lohit_hi.ttf,
Charter 247, 832 * 0.9765625 -> 812.5 (ttfonts.py, line 868)
Python 2.7 round(812.5) -> 813.0
Python 3.2 round(812.5) -> 812.0

This prevent PDF generation byte-to-byte equals generation.

Have anyone some proposes?

Original comment by romiq...@gmail.com on 25 Dec 2012 at 12:10

GoogleCodeExporter commented 9 years ago
Solved, byte accuracy with all set (93 ttf) achieved.
Test will be pushed soon.
Now sha1(pdf_1.7) == sha1(pdf py2.7) == sha1(pdf py3.2)

Next html generating, there are import name conflict due HTMLParse renaming 
with html.parse, which conflicts with html.py file (and html.py test).

Original comment by romiq...@gmail.com on 25 Dec 2012 at 8:44

GoogleCodeExporter commented 9 years ago
What about six package dependency for both 2.x and 3.x code?
http://packages.python.org/six/

This solve some issues with ugly code (especially for PNG loading code).

Original comment by romiq...@gmail.com on 8 Jan 2013 at 12:20

GoogleCodeExporter commented 9 years ago
I'd to do a quick & dirty py3k conversion, and as of today, it is almost 
working in the default branch.

Sorry, I forgotten almost completly your py3k and as I needed to get images 
working, I did some hacks to get complete py3k support.

We should release 1.7.2 and then you should try to merge your py3k branch, 
specially rounding issues and final byte handling in _out et al, and then 
release 1.7.3

Great work romiq.kh!

Original comment by reingart@gmail.com on 5 Feb 2014 at 5:02

GoogleCodeExporter commented 9 years ago
Hello, it looks like Python 3 mainly works (using current default branch), but 
I have been looking at fixing a couple of minor issues. I am planning to go 
through Roman’s py3k branch and pull out any useful changes that are still 
relevant. (Most of them I think are already applied or no longer needed.)

Do you think it would be okay to drop Python 2.5 support? Python 2.6 allows the 
b". . ." byte string syntax, and I would like to avoid the temporary Latin-1 
encoding hack. So instead of code like this:

self._out(sprintf('/MediaBox [0 0 %.2f %.2f]',w_pt,h_pt))
self._out('>>')
self._out('endobj')

it looks like this:

self._out(sprintf('/MediaBox [0 0 %.2f %.2f]',w_pt,h_pt))
self._out(b'>>')
self._out(b'endobj')

with sprintf() calling encode('latin1') behind the scenes.

Original comment by vadm...@gmail.com on 4 Jan 2015 at 9:37

GoogleCodeExporter commented 9 years ago
Hi, vadmium. As for py3k branch it is true. It doesn't relevant for any usefull 
ways.
Last good things - use struct.pack instead of bitshifting. All another just 
hack for 3k.
As for drop 2.5 support - my vote "still not sure, can we wait another year?".

Original comment by romiq...@gmail.com on 4 Jan 2015 at 8:23

GoogleCodeExporter commented 9 years ago
Okay, I’ll try ensure my changes are compatible with 2.5. I might end up 
leaving the Latin-1 hack there for the time being.

I think you are right about the py3k branch. The only things I can see are:
* The struct packing that you mentioned
* An error message formatting fix; I will make a pull request for these two 
when I have a chance
* Part of the rounding for TTF font metrics 
<https://code.google.com/p/pyfpdf/source/detail?r=97f2002af77a>. I think the 
part in fpdf.py is not applied in the default branch. But maybe it is better to 
just keep the code simple, than to make the rounding the same in this corner 
case.

Original comment by vadm...@gmail.com on 5 Jan 2015 at 3:50

GoogleCodeExporter commented 9 years ago
Thanks for mention TTF round patch, this should be covered by some test. I'll 
propose some patch for this later. This issue appeared only with some fonts.

Original comment by romiq...@gmail.com on 5 Jan 2015 at 7:41

GoogleCodeExporter commented 9 years ago
I agree wih romiq.kh, I still need Python 2.5 support for some of my customers.

Also, I think byte prefix doesn't add anything much more useful, and in fact it 
could be error prone as bytes array still doen't formatting via placeholders 
(and maybe other string operations).

The latin1 hack is a nasty one, but it helps to maintain the code clean, 
consistent and compatible, and also it should not imply any serious performance 
penalty (IIRC romiq.kh had pointed that out in previous comments).

Original comment by reingart@gmail.com on 5 Jan 2015 at 5:24