monte-language / typhon

A virtual machine for Monte.
Other
67 stars 10 forks source link

Case folding for Str #119

Open MostAwesomeDude opened 8 years ago

MostAwesomeDude commented 8 years ago

Strings should be case-foldable via some sort of immutable method. This is an invaluable tool for modern text processing which is language-agnostic and specified by Unicode.

Suggested verbs include .toCaseFold/0 or .toFoldCase/0.

This should come with documentation demonstrating why this is superior to using .toUpperCase/0 or .toLowerCase/0 for case-insensitive comparisons.

dckc commented 8 years ago

Have I been under a rock? I don't know any meaning of case folding other than folding to upper or lowercase. Don't we already have those?

If this is an "invaluable tool" perhaps somebody will give a couple use cases?

MostAwesomeDude commented 8 years ago

Sorry, I should have explained a bit more. The Unicode specification includes a fold which transforms strings into a case-insensitive version for comparison purposes. This fold was recently blessed into Python as str.casefold(). I'd like us to have this fold as well.

Basically, this is the Right Way to do case-insensitive comparisons. Using upper-case or lower-case is not a good strategy in non-English languages, with common examples in Turkish, German, etc. failing to work properly.

Python 3:

>>> "straße".casefold()
'strasse'
>>> "İ".casefold()
'i̇'
MostAwesomeDude commented 7 years ago

This turned out to be a complex rabbit hole. We can maybe just copy Python 3, but with extreme care.

Also, case mappings, like uppercase and lowercase, probably need to take locale parameters so that they do not depend on an ambient locale.