Closed mrackwitz closed 8 years ago
The problem appears to be the collation order used within utf8_compare
when string_compare_method == STRING_COMPARE_CORE
. I'm not clear how the collation order was computed as the tool it mentions for generating it (src/realm/tools/unicode.cpp) isn't runnable as-is, and when tweaked to run on my OS X machine gives entirely different results than what utf8_compare
uses.
git blame
suggests that @rrrlasse wrote this code. Lasse, can you take a look at this please?
fyi cocoa issue is realm/realm-cocoa#3417
I looked at this too and I discovered that my output from running the fragment in unicode.cpp with locale set to en_US (and others) produces collations which make our unit tests fail. We seem to be using a very non-standard ordering which puts lowercase letters above uppercase letters. @rrrlasse do you remember if this was modified by hand? Or maybe it is using some windows collation?
Checking the character orderings that we currently use, we can see that dash does come before space. I can easily swap them but because I don't understand exactly how this order was generated, I don't want to touch it. In fact I am tempted to mark this as expected behaviour because the rest of our orderings are non-conformant anyways ("alpha" comes before "ALPHA" for example).
Let me check this and maybe generate a new lookuptable
A user reported for the Cocoa bindings that a dash (
"-"
,0x2d
) in strings is ordered before a space (" "
,0x20
) when sorting. I can reproduce this in Core and suppose that this is not the expected behavior.