The ByteStringUtil.longestCommonPrefix(...) method isn't working when one of its parameters is a String and the other a Compact or Replazable String, in the internal strings (Replazable/Compact), the charAt(i) methods are returning byte[i] and in a string, it returns the character at location i, so if we are using non ASCII characters, we are using more than one byte. For example (Shorten value of a Wikidata literal of Q101213907)
String s1 = "\u00C2\u00A0normal";
CompactString s2 = new CompactString("\u00A0normal");
Assert.assertEquals(0, ByteStringUtil.longestCommonPrefix(s1, s2));
// java.lang.AssertionError:
// Expected :0
// Actual :8
The string value is
"\u00C2\u00A0" = char[] {0xC2, 0xA0}
The internal value is
utf8("\u00A0") = byte[] {0xC2, 0XA0}
The ByteStringUtil.longestCommonPrefix(...) method isn't working when one of its parameters is a String and the other a Compact or Replazable String, in the internal strings (Replazable/Compact), the
charAt(i)
methods are returningbyte[i]
and in a string, it returns the character at location i, so if we are using non ASCII characters, we are using more than one byte. For example (Shorten value of a Wikidata literal of Q101213907)The string value is
"\u00C2\u00A0" = char[] {0xC2, 0xA0}
The internal value isutf8("\u00A0") = byte[] {0xC2, 0XA0}
cf: UTF8
In the code, it is used 2 internal strings, but because the method is public, it might be better to fix it if someone is using the library method.