papandreou / node-cldr

node.js library for extracting data from CLDR (the Unicode Common Locale Data Repository)
BSD 3-Clause "New" or "Revised" License
123 stars 18 forks source link

linguistic sorting of country names #6

Closed nkhine closed 11 years ago

nkhine commented 11 years ago

I am using this library to localize country names to be displayed as an alpha sorted list for the http://www.zmgc.net website - TZM Network tab

If I change the language to Slovakian for example Čína is listed at the end of the list.

My list comes from https://github.com/TZM/tzm-blade/blob/master/data/chapters.json file and is being sorted by the following code, note i am using https://github.com/bminer/node-blade template engine which is similar to jade, here is a link to the actual file https://github.com/TZM/tzm-blade/blob/master/views/footer.blade:

var sortByKey = function(field, reverse, primer){
    var key = function (x) {return primer ? primer(x[field]) : x[field]};
    return function (a,b) {
        var A = key(a), B = key(b);
        return ((A < B) ? -1 : (A > B) ? +1 : 0) * [-1,1][+!!reverse];                  
    }
}
var countries = [];
var chapters = chapterJSON
    for(var i in chapters)
        var row = chapters[i].desc
        var code = row["LOCALES"].split("-")[1]
        countries.push({link:row["WEBSITE"],contact:row["CONTACT"],country: allCountries[code]})
var guide = locals.settings.translation.guide
countries.sort(sortByKey('country', true))

The countries list comes from https://github.com/TZM/tzm-blade/blob/master/app/config/apps.coffee#L171

Any advise on how to best improve the code to take into account the linguistic sorting use case would be much appreciated.

papandreou commented 11 years ago

CLDR seems to have all the collations you would ever dream of (https://github.com/papandreou/node-cldr/tree/master/3rdparty/cldr/common/collation) so I suppose it'd be possible to build an Array.prototype.sort-compatible comparator function for each locale from that.

I never looked into actually doing it because it seemed like String.prototype.localeCompare would do the trick. Unfortunately that seems to be crippled in node.js: https://groups.google.com/forum/#!topic/nodejs/edVSlqwM3qM

I guess that makes it worth doing, but there's quite a bit of spec to read up on before even understanding the data: http://www.unicode.org/reports/tr35/tr35-collation.html

Patches very welcome :)

nkhine commented 11 years ago

thanks for the reply, will look at the links and see if i can do something.

i just tested the String.prototype.localeCompare and this has been fixed

☺  locale | grep LC_COLLATE
LC_COLLATE="en_GB.UTF-8"
☺  node
> var x='�', y='�'
undefined
> x.localeCompare(y)
0