wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.9k stars 552 forks source link

[Feature] String Comparison Ignore Case #1134

Open mwasplund opened 1 year ago

mwasplund commented 1 year ago

I need to be able to compare strings ignoring case. This can be accomplished by introducing a string.compare with direct support for ignore case or by introducing toUpper/toLower (Best to use upper to avoid some situations characters to not match when round tripped to lowercase). Introducing the conversion methods are more general and support a broader range of scenarios, but will be slower than a direct comparison. I know this is a can of worms since it will most likely need to consider culture information. Wren may be able to use culture invariant comparisons, but unsure if that will meet the needs of all users of the language.

PureFox48 commented 1 year ago

Currently, we don't have methods in the core library to convert a string to lower or upper case.

However, there is a PR (#1019) to introduce String.lower based on the following Wren code:

lower {
    var output = ""
    for (c in codePoints) {
        if ((c >= 65 && c <= 90) || (c >= 192 && c <= 214) || (c >= 216 && c <= 222)) {
            c = c + 32
        }
        output = output + String.fromCodePoint(c)
    }
    return output
}

The PR doesn't include an upper function but the code for that would be:

upper {
    var output = ""
    for (c in codePoints) {
        if ((c >= 97 && c <= 122) || (c >= 224 && c <= 246) || (c >= 248 && c <= 254)) {
            c = c - 32
        }        
        output = output + String.fromCodePoint(c)
    }
    return output
}

These are based on the ISO-8859-1 character set (i.e. Unicode codepoints < 256) which has the merit of providing almost complete coverage of the major Western European languages. A minor problem is that there is no upper case equivalent of the German letter and the very rare French letter ÿ within this character set. Although it would be possible to upper case the former as SS unfortunately this would not round-trip.

I think myself this as far as we're likely to go in a simple language such as Wren. It would be much more difficult to extend casing to the full Unicode character set (though I do have methods which produce much greater coverage in my own modules), to provide normalization or to have locale specific versions for the reasons discussed in the PR.