Open whh1009 opened 4 years ago
Hello whh,
String codes = "\u5e7e\u8EAB\ue85d\ue85e\u21deb\u21df8\u347e\u347F";
Some of the \u
sequences look as if they contain 5 hex digits, for example \\u21df8
. Did you really intend to include the code points U+21DF "DOWNWARDS ARROW WITH DOUBLE STROKE" and U+0038 "DIGIT EIGHT"?
Thank you very much your reply. \u21df8
is a unicode, which actually corresponds to a Chinese character, please see https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=21df8&useutf8=true,there will be problems use 5 hex digits.
In Java, the character sequence \u21df8
is interpreted as U+21DF followed by U+0038. That's how it is, Java doesn't support \u
with more than 4 hexadecimal digits. See JLS 17 sections 3.1 to 3.3.
If you encode your desired code points in UTF-16, this may already solve your problem.
Contrary to Java, Unicode allows 5 or 6 digits when referring to a code point such as U+21DF8. Keep this difference between Unicode and Java in mind.
When I use sfntly to extract a subset of fonts, some unicode code points can be obtained correctly, but some are not. I am a little confused, please help to take a look.