rakshasa / rtorrent

rTorrent BitTorrent client
https://github.com/rakshasa/rtorrent/wiki
GNU General Public License v2.0
4.05k stars 412 forks source link

multibyte utf8 characters not handled properly in interface #1162

Open cyphar opened 2 years ago

cyphar commented 2 years ago

In several ways, multi-byte characters are not handled correctly by rtorrent's interface (you can download torrents that have utf8-encoded pathnames without issue, the issue is purely interface related but it results in some usability problems):

  1. All alignment of things in relation to text (columns, the cursor when inputting things into the prompt) appears to use strlen to calculate how many columns wide text is -- which means that multi-byte characters are treated as if they were wider than one column. To be fair, solving this perfectly (namely, handling combining characters and so on) is basically untenable outside of a font rendering library -- but at the very least rtorrent should use the number of Unicode codepoints as the length so that in most cases you get a more reasonable layout.
  2. It appears that you cannot enter utf8 characters into the text prompt, which means that if you have a .torrent file with a non-ASCII name, you won't be able to type that name and will have to rename the .torrent file in order to load it. I'm not sure what the root cause is here (though since it seems rtorrent implement its own line editing, it's possible that it's filtering out characters that have the top-most bit set -- which means that you're filtering out all multi-byte utf8 characters).
  3. If you tab-complete some utf8 text (something like foobarこれは例だけですけど・・・) and then try to delete (hold backspace) to delete all of the characters, it seems like it's not deleting the whole character (after removing all the non-ascii characters visually, the cursor is not aligned with the ascii characters and you have to press backspace a few more times before you start deleting the foobar text).
PRESFIL commented 2 years ago

Thank you for describing everything in more detail. I tried to cope with each of these points, but gave up.

It seems I managed to cope with the 1st and 2nd points, but not with the 3rd. But only with the help of terrible workarounds.

Ideally, necessary to use a different string type in libtorrent-rasterbar and rtorrent, but this will cause a compatibility violation. And I just didn't succeed to replaces all instances and compile successfully ;)

Related:

cyphar commented 2 years ago

Ah sorry, I missed all the other bug reports. Yeah so it seems like rtorrent-ps has fixed the issue (and it is entirely due to the use of strlen rather than utf8-aware length checking functions), and we need to port it to vanilla rtorrent. I might take a look at doing this when I have some time next weekend. Though rtorrent-ps appears to be using the wchar_t stuff in C++ rather than handling things purely using utf8-specific code...

If you have a branch with somewhat working code, please let me know and I can try to start from there.

PRESFIL commented 2 years ago

though since it seems rtorrent implement its own line editing, it's possible that it's filtering out characters that have the top-most bit set -- which means that you're filtering out all multi-byte utf8 characters

You are absolutely right. At least in rtorrent, at the time when I was trying to fix issue, there was such a check for utf-8 bytes, user input simply filtered. See https://github.com/rakshasa/rtorrent/issues/83#issuecomment-658936994

That's probably why I can't enter these characters in rtorrent-p s right now (I just checked).

PRESFIL commented 2 years ago

Ha ha, I didn't even think to apply the patch I was talking about to rtorrent-ps, it was even patched without conflicts and it applies immediately to rtorrent-ps PKGBUILD without problems!

The cursor moves by several positions depending on the number of bytes in the character, but deleting a character also returns it back by several characters at once. (as far as I remember rtorrent crashed if you move the cursor too far).

I will try to bring it to working state.

p.s. Too early to rejoice, it's the same as it was, I'm again in a dead end

cyphar commented 2 years ago

Okay, I'll take a look next weekend and see how far I can get. Thanks for the info.