realm / realm-core

Core database component for the Realm Mobile Database SDKs
https://realm.io
Apache License 2.0
1.02k stars 165 forks source link

Dash is sorted before space when sorting queries #1639

Closed mrackwitz closed 8 years ago

mrackwitz commented 8 years ago

A user reported for the Cocoa bindings that a dash ("-", 0x2d) in strings is ordered before a space (" ", 0x20) when sorting. I can reproduce this in Core and suppose that this is not the expected behavior.

bdash commented 8 years ago

The problem appears to be the collation order used within utf8_compare when string_compare_method == STRING_COMPARE_CORE. I'm not clear how the collation order was computed as the tool it mentions for generating it (src/realm/tools/unicode.cpp) isn't runnable as-is, and when tweaked to run on my OS X machine gives entirely different results than what utf8_compare uses.

git blame suggests that @rrrlasse wrote this code. Lasse, can you take a look at this please?

jpsim commented 8 years ago

fyi cocoa issue is realm/realm-cocoa#3417

ironage commented 8 years ago

I looked at this too and I discovered that my output from running the fragment in unicode.cpp with locale set to en_US (and others) produces collations which make our unit tests fail. We seem to be using a very non-standard ordering which puts lowercase letters above uppercase letters. @rrrlasse do you remember if this was modified by hand? Or maybe it is using some windows collation?

Checking the character orderings that we currently use, we can see that dash does come before space. I can easily swap them but because I don't understand exactly how this order was generated, I don't want to touch it. In fact I am tempted to mark this as expected behaviour because the rest of our orderings are non-conformant anyways ("alpha" comes before "ALPHA" for example).

rrrlasse commented 8 years ago

Let me check this and maybe generate a new lookuptable