mozilla / pontoon

Mozilla's Localization Platform
https://pontoon.mozilla.org
BSD 3-Clause "New" or "Revised" License
1.47k stars 528 forks source link

Pontoon does not correctly differentiate between Turkish dotted and dotless "i" #3323

Open harmitgoswami opened 2 months ago

harmitgoswami commented 2 months ago

Currently, Pontoon doesn't differentiate between the Turkish 'ı' and 'i' (capital I and İ respectively), despite these being different characters.

For example, these two queries produce the exact same results (in addition to incorrect highlighting):

https://pontoon.mozilla.org/tr/firefox/browser/browser/browser.ftl/?search=%C4%B1&string=246376 https://pontoon.mozilla.org/tr/firefox/browser/browser/browser.ftl/?search=i&string=246376

This bug has been bought up and addressed before: https://bugzilla.mozilla.org/show_bug.cgi?id=1346180

harmitgoswami commented 1 month ago

It seems after some research that database collation is the correct and recommended way to go: http://www.i18nguy.com/unicode/turkish-i18n.html

However, even after reverting to our previous approach, I can confirm that Pontoon still doesn't detect the difference between the 'i' and 'ı' characters.

Collation in Django does seem to be supported, but the way we invoke entities.filter and entities.order_by makes me think we'd need a pretty large refactor to properly use Django's Collate function.