pobrn / mktorrent

A simple command line utility to create BitTorrent metainfo files
Other
470 stars 73 forks source link

Half-Request: Better Compatibility with OS X #14

Open ghost opened 8 years ago

ghost commented 8 years ago

When OS X creates a torrent file, it uses a UTF-8 naming convention, however it's using the Normalization Form Canonical Decomposition (NFD) instead of Normalization Form Canonical Composition (NFC). Most clients on other operating systems, who would of course torrent using torrent files made on OS X, are going to expect NFC. Meanwhile, OS X torrent clients are, for the most part, able to handle NFC- or NFD-normalized filenames. They ensure the end filenames are NFD-normalized, which is what the HFS+ filesystem expects.

The ideal, therefore, is for all torrent files to be generated as UTF-8/NFC, which is the implicit standard for the Bittorrent protocol: "All strings in a .torrent file that contains text must be UTF-8 encoded."

At the moment, mktorrent (and all other torrent clients I've tested) do not distinguish between UTF-8/NFC and UTF-8/NFD, and therefore does not convert UTF-8/NFD text strings to UTF-8/NFC,

My question is, would there be a general interest in an option to format torrent files generated on OS X with the NFC unicode equivalent encoding for the torrent-creating computer's NFD-normalized filenames? If so, would forking the project be the best way to contribute? (New to GitHub, sorry!) Or is there already a super-secret option for something like that?

denkristoffer commented 7 years ago

The above user deleted their account so I assume the offer to add this is off the table, but it would definitely be a welcome addition as this has been giving me problems lately!

pobrn commented 3 years ago

@denkristoffer if this issue is still relevant, could you please elaborate the nature of the problems it's causing?

gennaios commented 3 years ago

As the torrent is created with a file name encoded in NFD, when attempting to seed on another operating system, the file name is different and as such the torrent client thinks the file does not exist. I often create torrents on macOS and seed on Linux. Currently, I have to first ensure there are no accents, diacritics, or non-Latin characters in the file name before creating.

pobrn commented 3 years ago

@gennaios thanks for the explanation.

gennaios commented 2 years ago

Any updates as to when this might be addressed? Mentioning it to someone, he said with accents, the created torrent file from macOS is even unusable on any system, even on macOS itself.

taylorthurlow commented 2 years ago

I just wanted to confirm that I'm also encountering this issue, and that it is definitely a property of mktorrent on macOS.

APFS (compared to HFS) seems to be happy to allow you to write unicode filenames with NFC-normalized characters, and they will stay that way, but mktorrent seems to read and generate its bencoded data structure with the path strings re-normalized back to NFD. This is how we get into the scenario that @gennaios mentioned, where it's even possible to generate a torrent on macOS, load it into a torrent client on that same system, and have it fail verification. This would require torrent clients to auto-normalize back to NFC unicode, which I can at least say that Deluge on linux is not doing.