Closed nogajun closed 2 years ago
Cool! I guess we need to use a proper unicode width algorithm instead of just counting characters. Is this translation available somewhere? It would be useful for testing.
What system are you using, with which locale / encoding?
If I remember correctly, awk is the one ending up computing the width of lines, and modern versions should give the character count and not the byte count. (The French version uses UTF-8, where accentuated characters are 2 bytes long.) But then, I'm not sure UTF-8 is the standard encoding for Japanese, and I don't know how GNU awk deals with other multibytes encodings.
Some things to try are thus:
gawk
rather than mawk
in Debian / Ubuntu),And I would love to add the japanese translation to the main repository if it is available somewhere!
I don't thing the problem has anything to do with the counting of bytes, it rather has to do with the fact that some characters take as much space as several ASCII characters when printed (see https://docs.rs/unicode-width/latest/unicode_width for example).
Ah! I should have read more carefully... I have no idea how to deal with this, but someone visibly does: https://github.com/ericpruitt/wcwidth.awk! I'll try that, probably sometime next week.
@nogajun can you provide a test file?
@nogajun Can you try pulling the branch full_width_characters
and see if this works? (637f6c54)
I don't like the idea of making all of GameShell dependant on an external obscure library (wcwidth.awk
), but if it works, we'll find a way...
@rlepigre Can you test it on some other strange files that you can think of?
I had to comment a few lines in wcwidth.awk
to make it work, and I haven't done a lot of testing...
You can directly use the scripts/box.sh
script outside of GameShell:
$ cd GameShell/scripts
$ ./box.sh < test.txt
I'll have a look on the weekend!
Actually, I just tried on google-translated versions of some of our files in several languages, and that seem to work as expected as far as character width is concerned!
However, out of curiosity, I also tried languages with a right-to-left reading direction, but these are displayed left-to-right in my terminal. I have no idea how these are supposed to be handled, but all I can say is that my terminal does not seem to handle the right-to-left mark correctly. (I tried with the example in https://en.wikipedia.org/wiki/Right-to-left_mark.)
304d9154 should fix this
GameShell is translated into Japanese. However, when Japanese is displayed, the display is corrupted.
One Japanese character is two ASCII characters wide. However, the number of characters does not match because it is calculated with one character. So, I expect that the display collapses.
Probably, Chinese and Korean will be in the same state.