mashpie / i18n-node

Lightweight simple translation module for node.js / express.js with dynamic json storage. Uses common __('...') syntax in app and templates.
MIT License
3.08k stars 419 forks source link

Enhanced pluralize #19

Closed alexeyco closed 8 years ago

alexeyco commented 12 years ago

Ok,

// 1 apple
__n('%s apple', '%s apples', 1);

// 3 apples
__n('%s apple', '%s apples', 3);

For english languge it's awesome, but for (as example) russian it's wrong. I mean:

{
    // Singular (or 21... 31... 121... 131...)
    "1 apple": "1 яблоко",

    // Plural (2-4, 22-24, 32-34... 122-124...)
    "2 apples": "2 яблока", 

    // Plural (5-20, 25-30, 35-40... 105-120...)
    "5 apples": "5 яблок" 
}

Are there any plans like that?

mashpie commented 12 years ago

whow!

wasn't aware of that - looks like we need some more logic, maybe via callback-methods that decide whether or not to output that plural or another

var pluralfor = function(n){
    var whichplural = 1;
    [...]
    return whichplural;
}

__nn('%s яблоко', '%s яблока', '%s яблок', pluralfor(n));

that would enable any foreign logic and so much plural forms as I can't imagine... Have to to think about it. What do you think?

alexeyco commented 12 years ago

I don't know the best way to solve this problem, but as a variant (see i18n.js:92):

    if (!msg.few)
        if (parseInt(count) > 1) {
            msg = vsprintf(msg.other, [count]);
        } else {
            msg = vsprintf(msg.one, [count]);
        }
    } else {
        // Complicated logic, taking to few-key
    }

In this case, it is enough to add the "few" key in the language file.

{
    "%s cat": {
        "one": "%s кошка", // 1, 21, 31,.. 101, 121,...
        "few": "%s кошки", // 2-4, 22-24, 32-34... 122-124
        "other": "%s кошек" // 5-20, 25-30, 35-40... 105-120...
    },
}

What do you think?

mashpie commented 12 years ago

feasable - but needs review. I am about to refactor a bit and push next version to npm finally

net147 commented 12 years ago

Standard names for these forms: zero, one, two, few, many and other (http://cldr.unicode.org/index/cldr-spec/plural-rules). Plural rules for each language: http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_rules.html

mashpie commented 12 years ago

thanks!

Lendar commented 12 years ago

Here is a plugin-based implementation in another library: https://github.com/jamuhl/i18next/blob/a9ca12236fe2eae351363cf2ff7232e84c5e342c/src/i18next.js#L596

It's similar to the code above. I would take their plugin interface.

Definition:

rules: {
    'sl': function (n) {
        return n % 100 === 1 ? 'one' : n % 100 === 2 ? 'two' : n % 100 === 3 || n % 100 === 4 ? 'few' : 'other';
    },

And logic:

if (pluralExtensions.rules[l]) {
    return pluralExtensions.rules[l](c);
} else {
    return c === 1 ? 'one' : 'other';
}
mashpie commented 11 years ago

like that plugin approach - might be worth to extend a none form, too

dexcell commented 11 years ago

Mozilla has codified a set of 17 plural rules across all languages. The rules can be found over here: https://developer.mozilla.org/en/Localization_and_Plurals

mashpie commented 11 years ago

ok, looks like there are some good suggestions around:

+1 Plugin-Api: will help keeping api backward compatible, lightweight, flexible +1 Mozzilla's approach to classify rules detailed by unicode.org

thanks to all for research so far

dexcell commented 11 years ago

I think unicode.org plural rules were great, we should use it and i think we still can maintain backward compatibility with it, since current release only support 'one' and 'other'.

We can add the missing categories such as 'zero', 'few', 'many', 'other', etc. Then we can determine which categories belongs to which language code from the configure method

i18n.configure({
    // Mean en.json will use en language code rules,
    // If we don't specify anything like 'de' means de.json will use de language code rules. (backward compatible)
    // And if we add regional code (like en-US, or in this example is zh-Hant) and did not explicitly specify language rules then i18n will smartly use 'zh' language rules by default
    locales:[['en', 'en'], 'de', 'zh-Hant']
});

What do you think?

Thank you

mashpie commented 11 years ago

ok so far, configuration won't be an issue, implementing it as one plugin for each locale makes this maintainable, testable und customizable. should be pretty easy to add unicode rules for each supported language as default and maintaining an api for people wanting to add some other wired logic... hope to start on this soon.

elmigranto commented 11 years ago

@mashpie I suggest you to take a look at existing realizations, this one for example. (It's compact, well-commented and does not calculate all this N % 10 every time.)

mashpie commented 11 years ago

much appreciated. thank you!

royaltm commented 10 years ago

What would it take to actually add this?

mashpie commented 10 years ago

a schedule.

there are some different approaches to implement pluralization. Some easy ones, some following standards. Review will take some extra time. Plus, I think it's time for refactoring to gain better readabilty and testabilty before starting with extra features.

There are libraries out there that do a good job on pluralization, but grown big. i18n aims to stay light and without too many dependecies, so it's agile to add to existing projects. I remember to have seen a very nice implementation of pluralization in airbnbs i18n module. i18next has grown more to a complete framework than a module and features pluralization amongst other high level features.

In the end I'd really like to put more effort on this than I am currently able to. So for now I just focus on stability and smaller features with a tiny scope and less impact on code until I can take some time off for refactoring...

Feel free to add your thoughts, this is still a strongly wanted feature :)

mobil beantwortet

Am 01.08.2014 um 19:47 schrieb royal notifications@github.com:

What would it take to actually add this?

— Reply to this email directly or view it on GitHub.

max-m commented 10 years ago

Hi,

I had a look into the code an came across these lines in i18n.js. Shouldn't it be like the following code as the rule says: “Everything but 1 is plural”?

  // parse translation and replace all digets '%d' by `count`
  // this also replaces extra strings '%%s' to parseble '%s' for next step
  // simplest 2 form implementation of plural, like https://developer.mozilla.org/en/docs/Localization_and_Plurals#Plural_rule_.231_.282_forms.29
  if (count == 1) {
    msg = vsprintf(msg.one, [parseInt(count, 10)]);
  } else {
    msg = vsprintf(msg.other, [parseInt(count, 10)]);
  }

__n("cat", "cats", 0) currently leads to cat __n("cat", "cats", 1) currently leads to cat __n("cat", "cats", 2) currently leads to cats

But the rule says “zero cats”.

Anachron commented 9 years ago

I think the best approach is to allow the users to define their own language plural rule/function in case they either use a currently unsupported language or like to change it (like 0 trees will be "no tree" instead). And yes, that would be cascading translations, but who cares right :)

For everything else, you could provide the defaults which are active right now.

jesucarr commented 9 years ago

What about implementing MessageFormat? it was built by ICU to solve this problems. Have a look at this library: https://github.com/SlexAxton/messageformat.js

mashpie commented 8 years ago

messageformat is really nice. I am actually thinking of an i18n.__mf() api that delegates directly to mf. Personally I don't like that ICU Format, but people will probably make use of it and should get that choice.

I close this issue now in favor of #42 as I collected some other proposals there too and... of course of... hay it's "42"!