sheredom / utf8.h

📚 single header utf8 string functions for C and C++
The Unlicense
1.71k stars 122 forks source link

Bug with utf8casecmp? #60

Closed kainjow closed 4 years ago

kainjow commented 5 years ago

It seems utf8casecmp is not working correctly. I was trying to use it with std::set as a custom comparator. I compared it to strcasecmp and found it is not giving the same results for basic ASCII strings. Note I wouldn't expect the same values, but I would expect it to match negative or positive.

    printf("%d\n", strcasecmp(".gdoc", ".GSHeeT")); // -15
    printf("%d\n", utf8casecmp(".gdoc", ".GSHeeT")); // 1
    printf("%d\n", strcasecmp(".gsheet", ".gSLiDe")); // -4
    printf("%d\n", utf8casecmp(".gsheet", ".gSLiDe")); // 1
kainjow commented 5 years ago

So I got it working by changing the last part of utf8casecmp from:

// If they don't match, then we return which of the original's are less
if (src1_orig_cp < src2_orig_cp) {
  return -1;
} else if (src1_orig_cp > src2_orig_cp) {
  return 1;
}

To:

// If they don't match, then we return the difference between the characters
return src1_cp - src2_cp;

This matches strcasecmp's output.

sheredom commented 4 years ago

I'm so sorry I missed this - all my notifications were disable in GitHub at some point and I don't know why, but I missed every single issue on every lib I have :(

Are you happy to file a PR for the above, or do you want me to file it?

kainjow commented 4 years ago

You can go ahead.

jpcy commented 3 years ago

utf8ncasecmp has the same problem and can be fixed the same way.