thagale / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Text facet sort by name should use case & diacritic insensitive collation #482

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Currently lowercase characters sort after all upper case characters so 'T' and 
't' are in wildly different spots and international characters collate at the 
very end so that 'Österreichische' is miles from the 'O's.

We should fold both case and diacritics to their base forms.

Original issue reported on code.google.com by tfmorris on 12 Nov 2011 at 7:51

GoogleCodeExporter commented 8 years ago
r2371 makes the sorting order case insensitive, but Javascript doesn't appear 
to have a built-in diacritic folding method, so that'll be a little more work.

After I committed the "fix" I discovered that this may actually be a 
browser-specific bug/difference, but it doesn't appear that there's been much 
progress in fixing it, so we probably should assume that the current state is 
going to exist for a while.
http://code.google.com/p/v8/issues/detail?id=459

There's a code snippet here that can be used to scrub diacritics: 
http://lehelk.com/2011/05/06/script-to-remove-diacritics/

Original comment by tfmorris on 12 Nov 2011 at 8:33