Open chubin opened 4 years ago
This is very likely a bug. Feel free to submit a PR ;)
I have written a small workaround for this problem, it works fine for me, but I don't think that it is a good solution for this bug.
That is how I do it:
def _fix_graphemes(text):
"""
Extract long graphemes sequences that can't be handled
by pyte correctly because of the bug pyte#131.
Graphemes are omited and replaced with placeholders,
and returned as a list.
Return:
text_without_graphemes, graphemes
"""
output = ""
graphemes = []
for gra in grapheme.graphemes(text):
if len(gra) > 1:
character = "!"
graphemes.append(gra)
else:
character = gra
output += character
return output, graphemes
I extract the graphemes before rendering, like this:
text, graphemes = _fix_graphemes(text)
and then after rendering I put them back.
It works like it should, but I am not sure that this method is (1) general enough (2) good for pyte, because it introduces a new dependency: grapheme
Consider this Python 3 code:
emoji_string
contains one grapheme cluster, that is displayed like in terminal/editor/etc:This emoji is displayed as a single one, but it conists of two
and
. Pyte seems to drop the second (the rest except the first part?) part of the cluster, and so the output of the program looks like this:We see that
efb88f
was dropped, and immediately aftere29881
, spaces follow (20
).Is it a bug in pyte or is it expected behaviour? Maybe, I've missed some configuration mode?