phyver / GameShell

a game to learn (or teach) how to use standard commands in a Unix shell
GNU General Public License v3.0
2.12k stars 135 forks source link

Japanese message display is corrupted #96

Closed nogajun closed 2 years ago

nogajun commented 2 years ago

GameShell is translated into Japanese. However, when Japanese is displayed, the display is corrupted.

One Japanese character is two ASCII characters wide. However, the number of characters does not match because it is calculated with one character. So, I expect that the display collapses.

Probably, Chinese and Korean will be in the same state.

image

rlepigre commented 2 years ago

Cool! I guess we need to use a proper unicode width algorithm instead of just counting characters. Is this translation available somewhere? It would be useful for testing.

phyver commented 2 years ago

What system are you using, with which locale / encoding?

If I remember correctly, awk is the one ending up computing the width of lines, and modern versions should give the character count and not the byte count. (The French version uses UTF-8, where accentuated characters are 2 bytes long.) But then, I'm not sure UTF-8 is the standard encoding for Japanese, and I don't know how GNU awk deals with other multibytes encodings.

Some things to try are thus:

And I would love to add the japanese translation to the main repository if it is available somewhere!

rlepigre commented 2 years ago

I don't thing the problem has anything to do with the counting of bytes, it rather has to do with the fact that some characters take as much space as several ASCII characters when printed (see https://docs.rs/unicode-width/latest/unicode_width for example).

phyver commented 2 years ago

Ah! I should have read more carefully... I have no idea how to deal with this, but someone visibly does: https://github.com/ericpruitt/wcwidth.awk! I'll try that, probably sometime next week.

@nogajun can you provide a test file?

phyver commented 2 years ago

@nogajun Can you try pulling the branch full_width_characters and see if this works? (637f6c54)

I don't like the idea of making all of GameShell dependant on an external obscure library (wcwidth.awk), but if it works, we'll find a way...

@rlepigre Can you test it on some other strange files that you can think of? I had to comment a few lines in wcwidth.awk to make it work, and I haven't done a lot of testing... You can directly use the scripts/box.sh script outside of GameShell:

    $ cd GameShell/scripts
    $ ./box.sh < test.txt
rlepigre commented 2 years ago

I'll have a look on the weekend!

rlepigre commented 2 years ago

Actually, I just tried on google-translated versions of some of our files in several languages, and that seem to work as expected as far as character width is concerned!

However, out of curiosity, I also tried languages with a right-to-left reading direction, but these are displayed left-to-right in my terminal. I have no idea how these are supposed to be handled, but all I can say is that my terminal does not seem to handle the right-to-left mark correctly. (I tried with the example in https://en.wikipedia.org/wiki/Right-to-left_mark.)

rlepigre commented 2 years ago

Maybe relevant: https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters.

phyver commented 2 years ago

304d9154 should fix this