Closed keianrao closed 4 years ago
I already have several changes planned, but right now I am going to look at lang.Character and also read briefly about surrogate characters.
Architecture-wise, there is nothing of concern yet - the data we are going to read should map cleanly to the typical emoji picker GUI
"The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)"
Alright. So, if we want to show the emoji in a Swing button, we need to provide it as a String.. which is a CharSequence which is in UTF-16.
As advertised, lang.Character does provide utility methods for converting those code points we see in emoji-test to UTF-16 surrogate pairs. Once we've assembled all the UTF-16 code units for the emoji, we can get a String out of it using lang.String#valueOf.
lang.Character puts forward a rather funny hack, the helper methods ask for basically UCS-4 values, using the int primitive type which is 32 bits.
I think we will go ahead with that ("translating to Java's native char & string"). Because, the alternative would be to roll our own solution for keeping those code points. I have no ideas at the top of my mind for how to do so, besides bit arrays, so it's probably not a good route.
I've read somewhat the Unicode Technical Standard #51, which explains the terms used in the data. Also, looking at the data files I've collected, I think the only one we will need is emoji-test, as it lists semantic groups as well as the code points for all of their members.
The problem is - Java's char is UTF-16 and only goes up to 0xFFFF, but emoji-test lists most of the emoji with code points above that.
This article by Red Hat mentions that lang.Character should have utility methods to help, and that if that fails I should look for "surrogate characters".
(One can also cheat by grabbing the start of the comments on each line - but comments shouldn't be relied on, and I think that will fail for emoji-modifiers, or if we try to add emoji-modifiers ourselves)