peterbrittain / asciimatics

A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations
Apache License 2.0
3.61k stars 238 forks source link

Emojis with "hidden characters"? #368

Closed ymyke closed 11 months ago

ymyke commented 1 year ago

Hi @peterbrittain – is there a way to control which kinds of emojis asciimatics shows when "hidden characters" are involved?

See here for some background info: https://twitter.com/emojipedia/status/953255029804273664

Example:

print("What asciimatics shows: \\U0001F6E1")
print("What Windows shows: \\U0001F6E1\\uFE0F")

Output:

example_output

peterbrittain commented 1 year ago

Interesting... IIUC, the '\uFE0F' is a variation selector. In theory this is something that your terminal should understand if output along with the selected emoticon. However, being a unicode codepoint, the Python string will show this as two separate entries in the glyph decoding and so Asciimatics will try to decode them as 2 separate characters. The end result would likely be the text representation and then a blank. Is that what you're seeing?

If so, this is a bug. I think this would require a tweak to the logic in Screen.print_at() to detect this case and update the previous cell in the buffer. It would also require some more logic in Screen.refresh() to ensure the correct variation is selected.

ymyke commented 1 year ago

Yes, that is what I see:

Example:

screen = Screen.open(unicode_aware=False)
show_text(screen, (0, 0), "Xx🛡️xX")
screen.refresh()

Output:

image

Also, it seems to be "swallowing" characters when more emojis are involved?

screen = Screen.open(unicode_aware=False)
show_text(screen, (0, 0), "Xx🛡️🍀xX")
screen.refresh()

image

However:

screen = Screen.open(unicode_aware=False)
show_text(screen, (0, 0), "Xx🛡️🍀🍀xX")
screen.refresh()

image

As an aside, here's what vscode shows:

image

Interestingly, the yellow indication is only there in the print call. It says "The character U+fe0f is invisible." That is also how I discovered this in the first place.

peterbrittain commented 1 year ago

Yeah - that all makes sense. The swallowing is all part of the same issue, because it didn't expect a zero length character. Largely should be a matter of detecting that in print_at and saving the full string to print the one character.

peterbrittain commented 1 year ago

So there is a very quick fix if you just want to display the text in the right place.

--- a/asciimatics/screen.py
+++ b/asciimatics/screen.py
@@ -638,6 +638,11 @@ class _AbstractCanvas(with_metaclass(ABCMeta, object)):
                     if x + i + j + width > self.width:
                         return

+                    # Handle modifier glyphs - just delete them for now.
+                    if width == 0:
+                        j -= 1
+                        continue
+
                     # Now handle the update.
                     if c != " " or not transparent:
                         # Fix up orphaned double-width glyphs that we've just bisected.

I can merge that patch to master now if this is good enough for your needs. Providing full support to control the text displayed would require some extending the double buffer to understand multi-glyph characters and so may take a while.

ymyke commented 1 year ago

Sorry for the delay, @peterbrittain, life came in the way.

I was looking into this again just now and I can't reproduce the problem with the "swallowing" problem any longer. Maybe something in the Powershell changed in the meantime?

I still have the problem that some emojis like 🛡️ don't show properly (i.e., too small) but that is a larger fix, if I understand correctly?

peterbrittain commented 1 year ago

NP - it happens.

Yeah - I don't understand the swallowing. Could be a function of the exact glyphs used, but the positioning is exactly what I'd expect, with a quick fix as per the previous patch. Fixing the size requires a full understanding of the grapheme clusters.

Maybe uniseg does this... Can you try out https://uniseg-python.readthedocs.io/en/latest/graphemecluster.html#uniseg.graphemecluster.grapheme_clusters to see if that breaks out your text correctly and keeps the emojis the right size?

ymyke commented 1 year ago

I tried this:

import time
from asciimatics.screen import Screen
from asciimatics.effects import Print
from asciimatics.renderers import StaticRenderer
from uniseg.graphemecluster import grapheme_clusters

def show_text(screen: Screen, pos, text: str) -> None:
    Print(
        screen=screen, renderer=StaticRenderer(images=[text]), x=pos[0], y=pos[1]
    ).update(0)

what = "[nok: ✌️❤️🛡️ ok: 🍀💓🔥]"
clusters = list(grapheme_clusters(what))
clustered = ",".join(clusters)

screen = Screen.open(unicode_aware=True)
show_text(screen, (0, 0), what)
show_text(screen, (0, 1), clustered)
for i, cluster in enumerate(clusters):
    show_text(screen, (0 + i, 2), cluster)
screen.refresh()
time.sleep(10)
screen.close()

print(what)
print(clustered)
for cluster in clusters:
    print(cluster, end="")

And get this:

Asciimatics:

image

Powershell:

image

Observations:

Does this help in any way?

peterbrittain commented 1 year ago

Thanks. I thought that meant we might be able to just tweak the logic in Screen to handle the string by clusters, but it turns out that this also affects all text input widgets. If you manage to input this sort of text, the logic to move the cursor and delete text doeesn't work correctly. That's going to take a while to unpick...

Do you need to use this text in text/textbox Widgets, or is it just labels or other printed content?

ymyke commented 1 year ago

I print them out directly, with code like the one given above in function show_text. (Not using widgets at all.)

Thanks for your help!

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

peterbrittain commented 11 months ago

OK, bot. I can take a hint. Given the required use case, I'll just strip the modifier to keep things simple.