jash-kothari-forks / libtorrent

Automatically exported from code.google.com/p/libtorrent
Other
0 stars 0 forks source link

0.16.9: replaces special characters (like é) in file names with _ #448

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I tested the same code with 0.16.8 and it reports correctly the filename.
Relevant qbittorrent bug: https://github.com/qbittorrent/qBittorrent/issues/511
Attaching the torrent file mentioned in that thread.

Original issue reported on code.google.com by hammered...@gmail.com on 18 Mar 2013 at 1:18

Attachments:

GoogleCodeExporter commented 9 years ago
that's because that character is not correctly utf-8 encoded.

in the fourth filename there is a character "e9", which is Latin-1 for é, but 
the utf-8 representation is "c3 a9".

When broken utf-8 is encountered, libtorrent replaces the code-point with an 
underscore.

Original comment by arvid.no...@gmail.com on 18 Mar 2013 at 11:07

GoogleCodeExporter commented 9 years ago
Since it is indeed in the wrong character encoding, there isn't much left to 
do. Thank you for your time.

Original comment by hammered...@gmail.com on 18 Mar 2013 at 11:10

GoogleCodeExporter commented 9 years ago
"there isn't much left to do." 
But it's working ok in uTorrent 

Original comment by stranged...@gmail.com on 6 Jun 2013 at 9:10

GoogleCodeExporter commented 9 years ago
so uTorrent is lenient on incorrect encodings. That doesn't make the .torrent 
any more correct.

If you think it's important to fall back on assuming Latin-1 (or some other 
code page), I'm open to accept patches. I suppose some reasonable argument 
should be presented why Latin-1 is the assumed code page, instead of 
extended-ascii, MS-DOS or any of the other ones.

I suppose some kind of conversion table from latin-1 to utf-8 would not be too 
big (at least half of the characters are shared afaik).

Original comment by arvid.no...@gmail.com on 7 Jun 2013 at 1:12

GoogleCodeExporter commented 9 years ago
I've taken a look at your torrent. Encoding is CP1251. Both uTorrent and 
BEncode Editor show correct names.

I guess this should go as a separate issue/feature request for encoding 
auto-detection (don't know if standard permits anything except utf though)?

P.S. I can upload the torrent, if needed.

Original comment by Daymansm...@gmail.com on 7 Jun 2013 at 4:33

GoogleCodeExporter commented 9 years ago
https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure
> The content of a metainfo file (the file ending in ".torrent") is a bencoded 
dictionary, containing the keys listed below. All character string values are 
UTF-8 encoded. 

So either way, libtorrent conforms to standard, while uT can try to deduct the 
encoding (by using system locale maybe?).

Original comment by Daymansm...@gmail.com on 9 Jun 2013 at 6:22