vslavik / poedit

Translations editor for Mac, Windows and Unix
https://poedit.net
MIT License
1.71k stars 274 forks source link

Changing the order of "Plural-Forms" #734

Closed JanisE closed 2 years ago

JanisE commented 2 years ago

What caused this change of the order of plural forms for the Latvian language?

https://github.com/vslavik/poedit/commit/37f1c6c0fc15f609a361477d5353bf0714ce17d1#diff-9a1f1a58c8355f1dc29991ae3afadc334123bb0f4c035e8e26642aa7a53c432eL85

From (n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2) to (n%10==0 || (n%100>=11 && n%100<=19) ? 0 : n%10==1 && n%100!=11 ? 1 : 2)

Adding || (n%100>=11 && n%100<=19) is nothing big – it's an approved alternative for the "zero" plural form.

What concerns me though is that the order / indices of the plural forms were changed. If we use numbers "1", "2", "0" as an example for the plural forms (and ignore the allowable shift of some numbers between forms "2" and "0"), then previously it was "1" => 0, "2" => 1, "0" => 2 and then it became "1" => 1, "2" => 2, "0" => 0.

This causes misunderstandings and errors as various PO editors and readers/users do not expect the same language having different plural formulas. In particular, I had problems with WordPress translations and with using Loco Translate (independent problem cases) resulting in PO files with one Plural-Forms formula in the header, but another formula being used for (some of) the actual translations in that file.

Again, I'm not worried that some numbers, for example, 12, was in the "other" ("2") form and then got moved to the "zero" ("0") form, that's a matter of the grammar and choosing one alternative over the other.

The problem is that, for example, number "1" was in the plural form with index 0, and now it is in the form with index 1. It's like in the case of English and its two indices 0 for the singular and 1 for the plural, you would just switch their places.

I suppose you now get your plural info from https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml And then use the plural form order of that XML file

<pluralRule count="zero">...</pluralRule>
<pluralRule count="one">...</pluralRule>
<pluralRule count="other">...</pluralRule>

Why do the PO file plural form indices (a PO-specific thing) depend on the CLDR data anyway? (And if they should be, do CLDR people know that they need to pay attention to the order and are aware of the potential consequences?) As far as Poedit is concerned, the index order didn't seem to depend on CLDR data before that commit on 2018-05-11.

vslavik commented 2 years ago

What caused this change of the order of plural forms for the Latvian language?

As the commit message says, it was regenerated. The forms are generated from CLDR data, which is something GNU gettext itself does nowadays.

This causes misunderstandings and errors as various PO editors and readers/users do not expect the same language having different plural formulas.

Any correct code handling PO files is unaffected - the plural forms expression is there so that the files can be interpreted and used without specific knowledge of the language. It's not a change that should be relevant. If some other editors ignore the Plural-Forms header, you're filling this in the wrong place. (FWIW, WordPress definitely does everything correctly.)

Why do the PO file plural form indices (a PO-specific thing) depend on the CLDR data anyway?

Because it's impractical to maintain the database ourselves.

(And if they should be, do CLDR people know that they need to pay attention to the order and are aware of the potential consequences?)

They don't.

I'm closing this because I don't see anything actionable on Poedit's end here, but feel free to correct me if I'm missing something.

JanisE commented 2 years ago

Thank you for the reply!

I think you're right, there is nothing actionable on Poedit's side. Also, I realised that the plural form labels in Poedit are apparently generated dynamically from the specific PO plural form formula to display the first four sample numbers of the form, so there is no problem there.

I would note that WordPress is (was) not doing it right though. They've performed some kind of mass-refactoring during the last few years regarding plural forms of various languages, during which the different formulas used for Latvian was not taken into consideration, resulting in partly wrong translations. Partly – because some of them have apparently been corrected by translators. I'm in the process of preparing translation suggestion-fixes for several WP translation projects, where I've gone through all the plural forms and sorted them out.

Also, WordPress has a PO export functionality for the translations, and they seem to be automatically attaching the freshly established "standard" formula even to the older translation versions, which may have had different formulas, resulting in wrong translation files. Although the older versions are probably used hardly by anyone anyway.

vslavik commented 2 years ago

I would note that WordPress is (was) not doing it right though.

Ah, I was referring to its code for handling PO files. I can totally see how a one-time data corruption in Glotpress could have happened like you describe...