Closed chubin closed 7 years ago
Thanks for reporting! This could be related to #62. Will investigate further.
I have found a new group of the evil characters. Unfortunately, this group seems to have nothing common with the former group:
҃ \u0483
҄ \u0484
҅ \u0485
҆ \u0486
҇ \u0487
The issue indeed has the same cause as #62. All of the characters you've listed contain some control bytes when UTF-8 encoded, e.g.
>>> "Н".encode("utf-8")
b'\xd0\x9d' # \x9d is OSC
>>> "қ".encode("utf-8")
b'\xd2\x9b' # \x9b is CSI
Of course they have, I listed some of them with their codes and they indeed contain 9d and 9b as you can see. On the other hand, in the last block I listed another group of characters, those do not contain neither 9d nor 9b. That seem to be another problem
The new "unprintable" group seems to be related to the way we do Unicode normalization as all of them (I think) are combining characters.
How do you think, are there any chances that the bug will be fixed in the next weeks? Or should I better downgrade pyte and use 0.5.2? Can I help somehow probably?
The bug is a consequence of delegating input decoding to Screen (see febdad70ba4b0eec509e1cf10d9ed2d9fb284e85). I am currently thinking about how to best approach this, can't guarantee the fix would arrive shortly.
If you have any ideas, feel free to share them here.
I can try to find some other broken characters if it can help
Don't worry, the ones you already came up with are already enough.
Any news about the issue may be? The problem is that many Japanese/Chinese are also corrupted. There are some simple workaround for Cyrllic/Greek, but things are getting worse with the oriental languages. So the issue is a real blocker for pyte 0.6 usage in a multilingual environment
I am still thinking on how to implement this without making the code too much of a nightmare. I have a prototype in a local branch but it is not finished yet. Most likely I won't have much time to work on this further until the next weekend, so if you have any ideas feel free to post them here or submit a PR.
So the issue is a real blocker for pyte 0.6 usage in a multilingual environment
Yes, I understand it is critical, but 0.6.0 has not been released, so I'd suggest to use the latest stable version if you're after correctness.
I confirm the problem is fixed now! @superbobry you are genius! Thank you very much!
Haha, thanks! Glad it works for you :)
pyte 0.6 has a strange regression with some Unicode characters, particularly with the Russian "Н" character:
That works:
That does not work:
As you can see, the output is empty in the second example (where the printed text contains "Н").
Everything works find with the 0.5.x version of the module.
Another problematic character: greek letter Ν
Some other broken characters:
1b, 1d, 5b, 5d, 9b, 9d seem to be the root of the problem