Closed MichaelHoste closed 3 years ago
- Rules importation
Do you know the license of the source XML file?
I'm positive of this approach if its license is compatible with ours (LGPLv3 or later).
We'll implement reader script by REXML instead of using export-plural-rules
. We don't want to depend on PHP.
- nplurals > 4
Hmm, I don't know without trying. Let's try.
Do you know the license of the source XML file?
The license is here: https://github.com/unicode-org/cldr/blob/master/unicode-license.txt
I'm not knowledgeable enough to know if it's compatible with LGPLv3.
We'll implement reader script by REXML instead of using export-plural-rules. We don't want to depend on PHP.
Creating GetText rules using the XML file is not obvious. I don't know if it's worth the effort to rewrite everything in Ruby. For my purposes, I added a Ruby exporter to the PHP package: https://github.com/michaelhoste/Languages/tree/ruby
export-plural-rules ruby > plural_rules.rb
creates this file: https://gist.github.com/MichaelHoste/1ed1647aca8bdce34cefa200b10e6fd9
It should be easy enough to move forward.
I'm quite worried about this pluralization issue, I noticed that some languages have more plural forms than before. Like French where they added a "many" plural form for multiples of 1000000 (I speak French and it's quite stupid in my opinion).
For Arabic and these languages with more plural forms, won't the existing GetText plural forms be shifted and the existing msgstr[1]
(other) become msgstr[2]
after the update? Existing projects will be impacted and they will need to rewrite all their plural forms.
It's not only a problem for Ruby, but for other languages as well since it's more of a GetText limitation. Once https://github.com/php-gettext/Languages and https://github.com/eemeli/make-plural will be updated to the latest Unicode version, PHP and JS will have the same issue.
I'm not sure how GetText should deal with this situation. Do you have any thoughts on this?
I've implemented CLDR plurals reader to https://github.com/red-data-tools/red-datasets . We can use CLDR plurals information with the following code:
require "datasets"
plurals = Datasets::CLDRPlurals.new
plurals.each do |locale|
pp locale.rules
end
I may not understand fully of your concern but it may not be a problem. Each .po
/.mo
file includes plural forms information such as nplurals=2; plural=(n != 1)
. So existing .po
/.mo
can still use old plural form rules. New .po
/.mo
with the latest plural forms information can use new plural form rules but they don't affect to existing .po
/.mo
.
Thank you for implementing the parsing of CLDR plurals!
However I'm not sure the current implementation will be precise enough to create GetText rules.
As you can see here, CLDR specifications are more complex than GetText specifications and some modifications should be applied to adapt one to the other.
Sometimes, even the number of plural is different, like in Czech:
CLDR:
GetText:
With the rule (n == 1) ? 0 : ((n >= 2 && n <= 4) ? 1 : 2)
Please note than v
in the rule represents the number of visible fraction digits in n, and should disappear from the GetText rule. Doing so, there is one less plural case in GetText than in the CLDR representation.
i extracted this compact hash of rules from my previous snippet and I'll use it internally, maybe that could be useful here too: https://gist.github.com/MichaelHoste/83fb089975efbc019e30613c383964cc
I may not understand fully of your concern but it may not be a problem. Each .po/.mo file includes plural forms information such as nplurals=2; plural=(n != 1). So existing .po/.mo can still use old plural form rules. New .po/.mo with the latest plural forms information can use new plural form rules but they don't affect to existing .po/.mo.
You're absolutely right! I didn't think of that, rules are only created once when using MsgInit
, and reused with MsgMerge
.
- nplurals > 4
Hmm, I don't know without trying. Let's try.
For information, it seems to work just fine 🙂
I created a local project with Arabic (6 plurals) and this rule:
(n == 0) ? 0 : ((n == 1) ? 1 : ((n == 2) ? 2 : ((n % 100 >= 3 && n % 100 <= 10) ? 3 : ((n % 100 >= 11 && n % 100 <= 99) ? 4 : 5))))
Using n_("%{num} stuff", "%{num} stuffs", num)
I have the correct block in the generated PO:
msgid "%{num} stuff"
msgid_plural "%{num} stuffs"
msgstr[0] ""
msgstr[1] ""
msgstr[2] ""
msgstr[3] ""
msgstr[4] ""
msgstr[5] ""
And when I translate all the plurals, it works correctly and select the appropriate plural depending on num
Implemented.
Thanks a lot for this! It's highly appreciated.
Unfortunately I found a small issue with this implementation.
I discovered an inconsistency for the Arabic language between my own implementation[1] and yours.
Here is the rule you generated:
(n == 1) ? 0 : (n == 0) ? 1 : (n == 2) ? 2 : ((n % 100) >= 3 && (n % 100) <= 10) ? 3 : ((n % 100) >= 11 && (n % 100) <= 99) ? 4 : 5
And mine:
(n == 0) ? 0 : ((n == 1) ? 1 : ((n == 2) ? 2 : ((n % 100 >= 3 && n % 100 <= 10) ? 3 : ((n % 100 >= 11 && n % 100 <= 99) ? 4 : 5))))
You will notice that n == 0
and n == 1
are inverted, and it seems that the second one is correct:
Do you have any idea where this inconsistency could come from?
[1] based on a fork of https://github.com/php-gettext/Languages that generated these rules
I'm sorry but I also have a more global issue with your implementation.
To me, it seems wrong to download and cache the complete list of rules through https://github.com/red-data-tools/red-datasets for these reasons:
plurals.xml
file is corrupted or if it's a work-in-progress on the Master branch, or even if the source repository is moved, this gem will stop working.plurals.xml
file is cached locally, it will not be updated if the source file is updated. It could create some misunderstanding if two projects that are apparently identical behave differently. The plural file is not even stored in the project but somewhere deep in the system and is not easy to find.My suggestion is that plural rules should be versioned with this gem and be part of the project itself, and not evolve independently. You don't expect the same version of the gem to behave differently with time.
What do you think about it? Does it make sense to you? Maybe Datasets::CLDRPlurals.new
could take an optional parameter to load a local file?
I discovered inconsistencies for other languages too. Maybe it can help isolate the issue:
Language: is
Generated rule: 0
Correct rule: n % 10 != 1 || n % 100 == 11
Languages: gu, hi, zu, kn, fa, am, as, bn
Generated rule: (n == 0) || (n == 1) ? 1 : 0
Correct rule: n > 1
(opposite!)
The other inconsistencies I found are:
br
((n % 10) == 1) && (((n % 100) != 11)) && (((n % 100) != 71)) && (((n % 100) != 91)) ? 0 : ((n % 10) == 2) && (((n % 100) != 12)) && (((n % 100) != 72)) && (((n % 100) != 92)) ? 1 : ((n % 10) >= 3 && (n % 10) <= 4) || ((n % 10) == 9) && (((n % 100) < 10 || (n % 100) > 19)) && (((n % 100) < 70 || (n % 100) > 79)) && (((n % 100) < 90 || (n % 100) > 99)) ? 2 : (n != 0) && ((n % 1000000) == 0) ? 3 : 4
(n % 10 == 1 && n % 100 != 11 && n % 100 != 71 && n % 100 != 91) ? 0 : ((n % 10 == 2 && n % 100 != 12 && n % 100 != 72 && n % 100 != 92) ? 1 : ((((n % 10 == 3 || n % 10 == 4) || n % 10 == 9) && (n % 100 < 10 || n % 100 > 19) && (n % 100 < 70 || n % 100 > 79) && (n % 100 < 90 || n % 100 > 99)) ? 2 : ((n != 0 && n % 1000000 == 0) ? 3 : 4)))
cy
(n == 1) ? 0 : (n == 0) ? 1 : (n == 2) ? 2 : (n == 3) ? 3 : (n == 6) ? 4 : 5
(n == 0) ? 0 : ((n == 1) ? 1 : ((n == 2) ? 2 : ((n == 3) ? 3 : ((n == 6) ? 4 : 5))))
fil
(n == 1) || (n == 2) || (n == 3) || (((n % 10) != 4)) && (((n % 10) != 6)) && (((n % 10) != 9)) ? 1 : 0
n != 1 && n != 2 && n != 3 && (n % 10 == 4 || n % 10 == 6 || n % 10 == 9)
lv
((n % 10) == 1) && ((n % 100) != 11) ? 0 : ((n % 10) == 0) || ((n % 100) >= 11 && (n % 100) <= 19) ? 1 : 2
(n % 10 == 0 || n % 100 >= 11 && n % 100 <= 19) ? 0 : ((n % 10 == 1 && n % 100 != 11) ? 1 : 2)
mk
((n % 10) == 1) && ((n % 100) != 11) ? 1 : 0
n % 10 != 1 || n % 100 == 11
tl
(n == 1) || (n == 2) || (n == 3) || (((n % 10) != 4)) && (((n % 10) != 6)) && (((n % 10) != 9)) ? 1 : 0
n != 1 && n != 2 && n != 3 && (n % 10 == 4 || n % 10 == 6 || n % 10 == 9)
tzm
(n <= 1) || (n >= 11 && n <= 99) ? 1 : 0
n >= 2 && (n < 11 || n > 99)
Maybe some of them are caused by the fact that we don't use the same CLDR version, but the one I checked are definitely wrong regarding the latest rules on https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/language_plural_rules.html
We noticed that some pluralization forms are missing from here: https://github.com/ruby-gettext/gettext/blob/master/lib/gettext/tools/msginit.rb#L322
One of the important missing ones is Arabic.
I have 2 questions:
1. Rules importation
Would you accept a pull request that extend these rules using the CLDR official source extracted to valid GetText rules using this tool?
export-plural-rules prettyjson > file.json
returns a file like this, and the idea would be to add it to this project and load the pluralization rules from it in theplural_forms
method:We could even simplify it to have a file like this:
or even transform it into a Ruby hash. What would be your preference?
2. nplurals > 4
There are currently no more than 4 pluralization forms, and it will change with Arabic that needs 6 of them.
Do you expect that there will be problems with the rest of the code? Or just improving the
plural_forms
method will be enough to make it work withmsgstr[4]
andmsgstr[5]
?Thank you for any feedback that you may have on this.
I am volunteering to work on this issue with your guidance.