Open euamotubaina opened 11 months ago
EF BF BD
means that filename contains non-utf symbol, we've tried and parsed as utf-8.
What's the encoding used in your filesystem for filenames?
I'm on Windows 11, which uses unicode to encode file paths, if I understood correctly.
I think this specific torrent used latin-1 encoding for the file paths, so I guess this is very much a corner case
I think this specific torrent used latin-1 encoding for the file paths, so I guess this is very much a corner case
Hm, latin-1
... This comment seems to be relevant
https://github.com/idlesign/torrentool/issues/2#issuecomment-166059474
This private tracker torrent file has a file path which includes an unicode character that's being incorrectly parsed
\x008D chr(189)
Vulgar Fraction One HalfI noticed it because after loading the file with the Torrent class, the calculated info_hash was different from the original torrent.
Screenshots of original torrent file and a new one created with
Torrent.to_file
from the same data in the hex editorOriginal:
Created with
Torrent
classWhen using the
Bencode
class to read and write the torrent, the char is correctly parsed and the hashes match.Here's a version of the original torrent without the tracker url
431f76f60e05250df162c90a73ab8377dc4ca9c8.zip
screenshot of the terminal output when reading the file with
Torrent
class (the file name is the correct sha1 hash)