Closed yoriyuki closed 13 years ago
BatRope.lowercase and BatRope.uppercase have the same issue.
Thank you for noticing this. I've changed to using the casefolding
function instead of the lowercase
function from CaseMap
, but I'm still having problems.
# module C = BatCamomile.CaseMap.Make(BatCamomile.UTF8);;
...
# C.casefolding "FOO";;
- : C.text = "foo"
# C.casefolding "foo";;
- : C.text = ""
# C.compare_caseless "foo" "FOO";;
- : int = -1
I don't understand the issue with Rope.lowercase
/uppercase
- are you saying they should use case folding, or that they need to have a parameter for locale?
Thanks for your help improving batteries.
Well, this is a bug of Camomile. Thank you for pointing out. I will release the bug fix soon.
# module C = BatCamomile.CaseMap.Make(BatCamomile.UTF8);;
...
# C.casefolding "FOO";;
- : C.text = "foo"
# C.casefolding "foo";;
- : C.text = ""
# C.compare_caseless "foo" "FOO";;
- : int = -1
Concerning BatRope.uppercase and BatRope.lowercase, they are implemented using bulk_fold. Unfortunately, Greek Sigma changes its lowercase character depending on the context, so it would cause the rare and hard to find problem.
Since case mapping algorithm is defined as a functor over modules which implement UnicodeString.Type(http://camomile.sourceforge.net/dochtml/CamomileLibrary.UnicodeString.html), I would suggest that you make a subset of Rope which satisfies that interface, then pass it to CaseMap.Make functor to obtain case mapping functions. In this way, you can have, say, collation easily too.
I push these fixes back on Ulib
, as we're planning on depending on it for unicode text.
Reading batRope.ml, I found that BatRope.icompare is implemented such a way that it converts each character to its lowercase individually, then compare it.
However this does not work if
The way of case-less comparison recommended by Unicode Standard is use case-folding algorithm. Camomile implements it as compare_caseless function in CaseMap module.
http://camomile.sourceforge.net/dochtml/CamomileLibrary.CaseMap.Type.html#VALcompare_caseless