Closed rmzelle closed 8 years ago
I've implemented a partial fix for this (in a PR) and it will correctly put Česká under C. However "Российский" begins with Cyrillic Er which is sorted after all latin letters. I'd argue this is a correct behaviour - alternative is to do transliteration of cyrillic into latin for sorting purposes.
However "Российский" begins with Cyrillic Er which is sorted after all latin letters. I'd argue this is a correct behaviour - alternative is to do transliteration of cyrillic into latin for sorting purposes.
@avram, can I ask your opinion on what constitutes the best sorting behavior as our Slavic expert?
(@tnajdek, I think ignoring diacritics for sorting is the more important issue, so thanks for doing that already!)
This is a hairy area -- there are standard collations (http://www.unicode.org/reports/tr10/) that cover all of this and which we should be able to lean on directly from PHP (http://php.net/manual/en/collator.create.php), My concern with transliterating the Cyrillic is that it might hurt searchability for Russian users and it might force them to have a transliterated style name in the otherwise Russian Zotero UI, which would be annoying.
My concern with transliterating the Cyrillic is that it might hurt searchability for Russian users and it might force them to have a transliterated style name in the otherwise Russian Zotero UI
As I understand it, the proposal is to just use the transliterated style title for sorting. The name that is displayed (and searched against) would stay Cyrillic.
Or is it accepted practice to sort Cyrillic characters after Latin, in which case we can just leave things as is?
It is accepted to have Cyrillic sort after Latin letters, and I don't see that as problematic.
Okay, thanks!
Transliteration would only be done for sorting purposes, it wouldn't affect how the style title is displayed. That being said I agree with @avram, it seems ot be accepted practice to sort Cyrillic after Latin.
The current sort order seems a little confusing. I assume people assume "Česká" to sort under "C" and "Российский" under "R":