Open jurgenvinju opened 1 month ago
This goes wrong in the runtime system of the interpreter which shares a lot of code with the runtime code for the compiler, so probably this breaks everywhere.
@jurgenvinju as it's represented as a surrogate pair, what tends to go wrong is that you only see the high surrogate part of the pair. 🍝 is encoded as 0xD83C & 0xDF5D. 55356 in hex is 0xD83C
.
So the bug is: somewhere the type generation takes a java string and gets the first char (charAt(0)
I assume), instead of correctly using codepointAt(0)
.
So the unicode codepoint
127837
is the right codepoint for🍝
, but type-reification in a character class type turns it into codepoint55356
whch does not even have a graphical representation in the current font:?
. Maybe it's not even a codepoint.