selectel / pyte

Simple VTXXX-compatible linux terminal emulator
http://pyte.readthedocs.org/
GNU Lesser General Public License v3.0
656 stars 101 forks source link

Emojis/Grapheme clusters seem to be broken in pyte #131

Open chubin opened 4 years ago

chubin commented 4 years ago

Consider this Python 3 code:

# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals

import pyte

if __name__ == "__main__":
    emoji_string = "☁️"
    print(emoji_string.encode("utf-8").hex())
    print("---")

    screen = pyte.Screen(80, 24)
    stream = pyte.Stream(screen)
    stream.feed(emoji_string)
    for character in screen.display[0][:3]:
        print(character.encode("utf-8").hex())

emoji_string contains one grapheme cluster, that is displayed like in terminal/editor/etc:

Screenshot_2020-04-03_14-39-04

This emoji is displayed as a single one, but it conists of two and. Pyte seems to drop the second (the rest except the first part?) part of the cluster, and so the output of the program looks like this:

e29881efb88f
---
e29881
20
20

We see that efb88f was dropped, and immediately after e29881, spaces follow (20).

Is it a bug in pyte or is it expected behaviour? Maybe, I've missed some configuration mode?

superbobry commented 4 years ago

This is very likely a bug. Feel free to submit a PR ;)

chubin commented 4 years ago

I have written a small workaround for this problem, it works fine for me, but I don't think that it is a good solution for this bug.

That is how I do it:

  def _fix_graphemes(text):
      """
      Extract long graphemes sequences that can't be handled
      by pyte correctly because of the bug pyte#131.
      Graphemes are omited and replaced with placeholders,
      and returned as a list.

      Return:
          text_without_graphemes, graphemes
      """

      output = ""
      graphemes = []

      for gra in grapheme.graphemes(text):
          if len(gra) > 1:
              character = "!"
              graphemes.append(gra)
          else:
              character = gra
          output += character

      return output, graphemes

I extract the graphemes before rendering, like this:

text, graphemes = _fix_graphemes(text)

and then after rendering I put them back.

It works like it should, but I am not sure that this method is (1) general enough (2) good for pyte, because it introduces a new dependency: grapheme