Open atoktoto opened 4 years ago
Assuming that you're talking about Windows, printing Emoji to Terminal works already via the WriteConsole
native call in WindowsConsoleOutputStream. But only the new Windows Terminal
is capable by default to actually display Emoji. CMD
and PowerShell
will just display a box.
That being said, full support for grapheme clusters (i.e. user-perceived characters) would be appreciated not for Emoji, but for all the non-Latin languages where a series of code points
coalesce into a single-width user-perceived character.
com.ibm.icu.text.BreakIterator
can be used to iterate grapheme clusters (i.e. single-width user-perceived characters) though this will require icu4j
as an additional dependency.
import com.ibm.icu.text.BreakIterator;
public static List<String> getGraphemeClusters(String self) {
List<String> characters = new ArrayList<String>(self.length());
BreakIterator i = BreakIterator.getCharacterInstance();
i.setText(self);
for (int begin = 0, end = 0; (end = i.next()) != BreakIterator.DONE; begin = i.current()) {
characters.add(self.substring(begin, end));
}
return characters;
}
The JDK built-in java.text.BreakIterator
may or may not work well depending on the specific use case. It'll work for Asian languages (e.g. บุฟเฟต์
) but won't work Emoji sequences (e.g. 👩👩👦👦
).
Seems certainly doable but would require a significant change (or addition) to the Lanterna interfaces. Currently, char
and TextCharacter
seems to be the center of the whole operation. Replacing it with String
representing a single grapheme cluster seems wasteful (in terms of memory) for the general case and can make it less legible: void putCharacter(String s)
looks wrong :D
Also, Windows Terminal only displays emoji correctly if running a WSL session
I don't see why emoji wouldn't work with the current system, given that we can do CJK characters just fine. I'll investigate, maybe it's the terminal encoding that needs to be updated.
Here's what I get with lanterna 3.0.3
for a file name such as THAIบุฟเฟต์EMOJI👩👩👦👦.txt
:
That being said, neither iTerm
nor Terminal
render this particular Emoji correctly either:
$ ls
THAIบุฟเฟต์EMOJI👩?👩?👦?👦.txt
EDIT: บุฟเฟต์
does render correctly, but the layout does not account for compound characters บุ
and ต์
taking up only 1
character (even though it's 2 code points
each) and so the layout is off by 2
here:
$ ls
TEST.mp4
บุฟเฟต์.mp4
CKJ works because those are 1 code point
per character, i.e. 好
is 1
code point, but บุ
is 2
code points which are composed into a single logical character by the text renderer.
Interesting, so the CJK detector incorrectly flags บุ
as two text characters wide?
Ok, I see the problem now. Java char type isn't able to store emoji: https://developers.redhat.com/blog/2019/08/16/manipulating-emojis-in-java-or-what-is-%F0%9F%90%BB-1/ Slightly unexpected. Will see what we can do about this.
Yes, บุ
is 2
code points. It even requires hitting DELETE
twice to delete the entire character. Hitting DELETE
once only changes บุ
to บ
. Kinda like NFD except there is no NFC for บุ
.
The problem I'm finding is that even "บุ".length()
returns 2...
I'm trying to change the internal representation of TerminalCharacter to String, but it's tricky to know if the character should be considered single- or double-width, given Java provides little guidance. I'd like to avoid hard-coding unicode page references if possible...
Have been browsing articles and it really seems like while we can get the number of code points, there's no way to know if these code points are combined into a single character, or if that character is double or single width!
Yep, pretty much. lanterna
effectively can't predict how a terminal window is going to render the text, because it depends on the version of unicode used by the text renderer. Though we can generally assume that long-established unicode sequences like บุ
will work just fine, while recent additions like 👩👩👦👦
are likely to not work.
You can use the java.text.BreakIterator
to split a String into "display characters" like so:
public static List<String> getGraphemeClusters(String self) {
List<String> characters = new ArrayList<String>(self.length());
BreakIterator i = BreakIterator.getCharacterInstance();
i.setText(self);
for (int begin = 0, end = 0; (end = i.next()) != BreakIterator.DONE; begin = i.current()) {
characters.add(self.substring(begin, end));
}
return characters;
}
java.text.BreakIterator
and com.ibm.icu.text.BreakIterator
can be used interchangeably. java.text.BreakIterator
has the advantage of being a JDK built-in class. com.ibm.icu.text.BreakIterator
has the advantage of working better for recent unicode additions (i.e. complex compound emoji sequences; notably probably something your terminal window won't display correctly anyway).
It might make sense to make the BreakIterator
configurable:
NullBreakIterator
(same behaviour as now, 1 code point = 1 character)java.text.BreakIterator
(e.g. for users that target Windows CMD)com.ibm.icu.text.BreakIterator
(e.g. for users that target the new Windows Terminal)Ok, so here's what we'll do. In 3.0 we'll restrict TextCharacter to BMP only, with an override if you really know what you're doing. In 3.1, also restrict but change to use String internally and let you supply your own "String" character for complicated emoji. Will try this out.
Ok, I misunderstood the BMP plane again. I've just blocked 3.0 from creating TextCharacters from surrogate char:s at least. So next will use the BreakIterator above to in 3.1 to try to group characters.
Okay, I've re-worked TextCharacter to support this: PR for review: https://github.com/mabe02/lanterna/pull/508
Ok, code is merged. If you clone and build release/3.1 (I'll do another release in a week or so) you should be able to print emoji as double-width and your magic บุ character only occupying one column. Please try it out and report back before I close this.
@mabe02 , cant print BOMB character "\uD83D\uDCA3" or 💣 using Lanterna 3.2.0-master on Windows 7 x64 (SwingTerminalWindow). I just get two rectangles(
Emojis are a bit "complicated" - they are incompatible with lanterna's current approach of 1 (java) "char" per screen-position (cell).
The complicated solution would be to store Strings for each cell, but that would lead to a more or less complete rewrite of the whole library.
Eventually it might happen, and it will not only enable emojis, but also enable some glyphs in some scripts, that get combined out of several characters. But it may be too early to start holding your breath for it.
On Tue, Oct 15, 2024 at 6:35 AM Austin @.***> wrote:
@mabe02 https://github.com/mabe02 I too can't get emojis working. I'm testing on 3.1.2 and 3.2.0-alpha1 from maven. I'm only getting rectangles in the swing window and question marks in the normal terminal window in Ubuntu. Is there something I need to do to enable them or to display them. Thanks for your assistance!
— Reply to this email directly, view it on GitHub https://github.com/mabe02/lanterna/issues/505#issuecomment-2412877306, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIDBMX355XFJWSX4E6RUJDZ3SLQLAVCNFSM6AAAAABP6HSSCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJSHA3TOMZQGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Works for me on Ubuntu: Maybe it's a font problem? I just did this: terminal.putString("\uD83D\uDCA3");
(modified SwingTerminalTest.java)
The code
textGraphics.putString(0, 3, "🍕")
results in two question mark characters being displayed in the terminal. At the same timeSystem.out.println("🍕")
works as intended (at least in terminal emulators supporting emojis: ie. the new Windows Terminal).Is it possible to support this use-case? I guess the emoji codepoints do not fit into
char
type that is used inTerminal.putCharacter
so this would require major changes.