w00t-labs / libtorrent

Automatically exported from code.google.com/p/libtorrent
Other
0 stars 0 forks source link

python3 support #449

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run session.save_state() on Python3.2 bindings

What is the expected output? What do you see instead?
Program fails and raised UnicodeDecodeError during invoking session.save_state()

What version of the product are you using? On what operating system?
The newest 0.16.9, but on 0.16.8 problem occured too. I built libtorrent on 
Ubuntu 12.10 with Python 3.2 binding.

Please provide any additional information below.

I ran code: 
# -*- coding: utf-8 -*-
import libtorrent as lt

ses = lt.session()
ses.listen_on(6821, 6831)
ses.save_state()

and last line cause UnicodeDecodeError. I've tested it on binding for Python 
2.7 and everything works and I've got data like this:
{'dht state': {'node-id': '5.\xf3\xde\x1a\xd3_\n\r\x1cHX\xe3@(\xc1\xa7IS\x83'}, 
'settings': {'half_open_limit': 2147483647}, 'i2p': {}, 'encryption': {}, 'AS 
map': {}, 'proxy': {}, 'feeds': [], 'dht': {}}

Python 3.2 can't managed with binary data 'node-id'.

Original issue reported on code.google.com by rafaljag...@gmail.com on 20 Mar 2013 at 4:00

GoogleCodeExporter commented 8 years ago
Official word on Python 3 libtorrent support from the creator would be much 
appreciated.

I wasted a day trying to get it to work on Ubuntu when i noticed the Python 3 
Windows binaries in Downloads.
Turns out those Windows binaries fail in the exact way you're describing on 
Windows itself, so it's just not a matter of a missing ./configure flag on 
Linux.

Original comment by onlinema...@gmail.com on 31 Mar 2013 at 10:36

GoogleCodeExporter commented 8 years ago
I believe the fix would be for libtorrent to represent its bencoded structures 
as byte-arrays rather than strings.

I'll look into this, thanks for the report.

Original comment by arvid.no...@gmail.com on 1 Apr 2013 at 5:07

GoogleCodeExporter commented 8 years ago
it appears boost.python has poor to little support for byte arrays. Not sure 
what the best approach is. maybe manually convert the strings.

Original comment by arvid.no...@gmail.com on 2 Apr 2013 at 4:32

GoogleCodeExporter commented 8 years ago
what about Py3 >>> C++? is it the same issue? 

lt.bdecode(bytes) throws UnicodeDecodeError with Boost1.53 which should have 
this fix https://svn.boost.org/trac/boost/ticket/4609
lt.bdecode(str)returns None

i wish i knew how to make it work manually

Original comment by onlinema...@gmail.com on 2 Apr 2013 at 9:18

GoogleCodeExporter commented 8 years ago
I think the solution to this is something along these lines. Would you mind 
testing this patch?

It essentially introduces a wrapper around string to indicate that it should be 
treated as a byte array, and also introduces a custom converter for that type.

Original comment by arvid.no...@gmail.com on 29 Apr 2013 at 4:51

Attachments:

GoogleCodeExporter commented 8 years ago
Make couldn't find 'byte_array.hpp', got it working with:

## utility.cpp and entry.cpp
- include <byte_array.hpp>
+ #include "byte_array.hpp"

## Makefile.am
+   src/byte_array.hpp        \

ses.save_state() now returns:
{'dht state': {'node-id': 
bytearray(b'\xd2\xf8\xder*\x9d\xd6\x1a\xe3\x17\xe2\x83\xea\xc4\xd4\xfe7\x8bX+')}
, 'settings': {'half_open_limit': 2147483647}, 'proxy': {}, 'feeds': [], 'dht': 
{}, 'i2p': {}, 'encryption': {}}

Not sure that dealing with 'bytearray' python type is ideal, but it's much 
better than nothing, 

i.e.
torrent_bytearray = bytearray(torrent_bytes) # additional step
torrent_d = lt.bdecode(torrent_bytearray) # works

ti = lt.torrent_info(torrent_d) # doesn't work, "incorrect number of piece 
hashes in torrent file"

Original comment by onlinema...@gmail.com on 30 Apr 2013 at 8:19

GoogleCodeExporter commented 8 years ago
is there a more appropriate type than byte_array?

Original comment by arvid.no...@gmail.com on 4 May 2013 at 4:16

GoogleCodeExporter commented 8 years ago
PyString for text, PyBytes for binary, of course.

I will gladly and thoroughly test any patch, but it's hard to make specific 
intelligent suggestions not knowing either c++ or boost.

Does the aforementioned Boost patch affect us?

This c++ library looks to be exactly the reference we need, please take a look 
at Bytes.python3.ipp 
https://code.google.com/p/ackward/source/browse/#git%2Fsrc%2Fackward%2Fcore

Original comment by onlinema...@gmail.com on 4 May 2013 at 1:30

GoogleCodeExporter commented 8 years ago
ok, thanks!

as far as I can see, the boost patch does not intersect with this issue. It's 
really just a small hack to work around one specifica aspect of lack of support 
for distinguishing between binary data and strings in boost.python. I'll update 
my patch soon.

Original comment by arvid.no...@gmail.com on 4 May 2013 at 5:16

GoogleCodeExporter commented 8 years ago
how about this patch?

If anyone would want to make sure this isn't breaking anything for python2 
builds, that would be great as well.

Original comment by arvid.no...@gmail.com on 5 May 2013 at 10:48

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by arvid.no...@gmail.com on 5 May 2013 at 10:49

GoogleCodeExporter commented 8 years ago
Issue 451 has been merged into this issue.

Original comment by arvid.no...@gmail.com on 5 May 2013 at 10:49

GoogleCodeExporter commented 8 years ago
Issue 375 has been merged into this issue.

Original comment by arvid.no...@gmail.com on 5 May 2013 at 10:53

GoogleCodeExporter commented 8 years ago
bdecode(bytes) # doesn't work, returns None

torrent_info(path_str) # works!
that's enough to get it running :))

ti.add_tracker(str, 0) # works!
for announce_entry in ti.trackers():
    print (announce_entry.url) # works!, returns str as it should

however, write_resume_data() returns bytes for all of str
i.e. 'file-format': b'libtorrent resume file'

the disturbing thing i'm realizing is that with py2, unicode type was never in 
play with libtorrent, i.e. doing torrent_info.add_tracker(u'tracker_url') 
wouldn't work,
which means it's not possible to do the following:
py2 > py3
unicode > str
str > bytes

not sure how libtorrent will know where to accept/return bytes vs str since str 
was the only type before

make needed a tweak:
entry.cpp
- include <bytes.hpp>
+ #include "bytes.hpp"

Original comment by onlinema...@gmail.com on 6 May 2013 at 1:34

GoogleCodeExporter commented 8 years ago
> bdecode(bytes) # doesn't work, returns None

I just noticed I made a mistake in my patch. I meant to change both bencode() 
and bdecode() to only operate on bytes, but I must have lost that change before 
I was done. I'm updating my patch.

> ti.add_tracker(str, 0) # works!
> for announce_entry in ti.trackers():
>     print (announce_entry.url) # works!, returns str as it should

I only changed a few places I could think of where the strings should be bytes. 
In this case, I'm still just letting boost.python convert std::string. I wonder 
what happens when those strings have fancy unicode characters encoded as utf-8.

> however, write_resume_data() returns bytes for all of str
> i.e. 'file-format': b'libtorrent resume file'

This is primarily a property of bencoding, which doesn't distinguish between 
strings and bytes. For this reason, I believe the only reasonable thing to do 
is to treat all bencoded strings as bytes, and the user may interpret it as a 
string by decoding utf-8 -> unicode.

> not sure how libtorrent will know where to accept/return bytes vs str since 
str was the only type before

I'm imagining that some places will take bytes and some strings. I also imagine 
that in order to get full python 3 support, I should make all strings convert 
between utf-8 on the c++ side and unicode strings on the python side. I was 
under the impression that in py3 there is no string object anymore, just 
unicode and bytes, no?

See updated patch.

Original comment by arvid.no...@gmail.com on 6 May 2013 at 9:07

Attachments:

GoogleCodeExporter commented 8 years ago
bdecode works. you're right about the bytes it returns.
bencode works too. sha1 matches up.

can't get fast_resume to be accepted:
params = {'resume_data': bencode(write_resume_data())}
"fast resume rejected: expected colon in bencoded string"

set_settings/get_settings accept and return the following
'user_agent': 'µLibtorrent'
torrent_info.name() returns:
'[魔穗字幕组]学園催眠隷奴 anime 
いやっ、絶対まだ妊娠なん]'
that's already as utf8 as it get

they renamed 'unicode' to 'str', 'unicode' object is gone.
'str' got renamed to 'bytes'

Original comment by onlinema...@gmail.com on 6 May 2013 at 1:12

GoogleCodeExporter commented 8 years ago
> can't get fast_resume to be accepted:
> params = {'resume_data': bencode(write_resume_data())}
> "fast resume rejected: expected colon in bencoded string"

Are you sure you're passing in the buffer as bytes?

Original comment by arvid.no...@gmail.com on 7 May 2013 at 4:04

GoogleCodeExporter commented 8 years ago
i think so.
resume_enc looks about right:

b'd11:active_timei2e10:added_timei1367842609e10:allocation6:sparse15:announce_to
_dhti1e15:announce_to_lsdi1e20:announce_to_trackersi1e12:auto_managedi1e12:banne
d_peers0:13:banned_peers60......

exact steps:

resume = write_resume_data()
resume_enc = lt.bencode(resume)
with open('./resume_enc', 'wb') as h:
    h.write(resume_enc)
with open('./resume_enc', 'rb') as h:
    resume_enc = h.read()

params = {'resume_data': resume_enc}

Make sure lazy_bdecode received the bytes treatment.
I can recompile lt with a debugging line if you tell me the line and where to 
put it.

Original comment by onlinema...@gmail.com on 7 May 2013 at 11:59

GoogleCodeExporter commented 8 years ago
Not directly related to the initial issue, but since the ticket got labeled as 
"python 3 support" thought I should add it here.

From python 3.3, the official compiler is now visual studio 2010 (see 
http://blog.python.org/2012/05/recent-windows-changes-in-python-33.html).

Original comment by Necroman...@gmail.com on 7 Jul 2013 at 12:09

GoogleCodeExporter commented 8 years ago
my understanding is that this works now. is that right?

Original comment by arvid.no...@gmail.com on 21 Sep 2014 at 12:42