pobrn / mktorrent

A simple command line utility to create BitTorrent metainfo files
Other
472 stars 73 forks source link

how do we add the encoding = UTF-8 option? #59

Open Suitear opened 3 years ago

Suitear commented 3 years ago

image

After looking through the output of mktorrent --help command, I can't find it. So wondering any hotshot can help me out. Many thanks.

FranciscoPombal commented 3 years ago

That's a non-standard field AFAIK. It's useless anyway - the correct thing to do nowadays is (and has been for along time now) to simply assume UTF-8.

Usage of any other encodings for text is just wrong and should be considered a bug/misfeature in software that uses/expects them by default.

Similarly, if someone creates a torrent whose filenames/title/comment are not UTF-8 encoded, that's a problem on their side that they should fix.

FranciscoPombal commented 3 years ago

To further strengthen my argument, here is a relevant quote from the spec, https://www.bittorrent.org/beps/bep_0052.html (emphasis mine):

BEP authors are encouraged to use ASCII-compatible strings for dictionary keys and UTF-8 for human-readable data. (...) All strings in a .torrent file defined by this BEP that contain human-readable text are UTF-8 encoded. (...) file tree

A tree of dictionaries where dictionary keys represent UTF-8 encoded path elements. Entries with zero-length keys describe the properties of the composed path at that point. 'UTF-8 encoded' in this context only means that if the native encoding is known at creation time it must be converted to UTF-8. Keys may contain invalid UTF-8 sequences or characters and names that are reserved on specific filesystems. Implementations must be prepared to sanitize them. On most platforms path components exactly matching '.' and '..' must be sanitized since they could lead to directory traversal attacks and conflicting path descriptions. On platforms that require valid UTF-8 path components this sanitizing step must happen after normalizing overlong UTF-8 encodings.

And also from the legacy v1 spec, https://www.bittorrent.org/beps/bep_0003.html (again, emphasis mine):

All strings in a .torrent file that contains text must be UTF-8 encoded. (...) The name key maps to a UTF-8 encoded string which is the suggested name to save the file (or directory) as. It is purely advisory. (...) path - A list of UTF-8 encoded strings corresponding to subdirectory names, the last of which is the actual file name (a zero length list is an error case).