gwtproject / gwt

GWT Open Source Project
http://www.gwtproject.org
1.52k stars 375 forks source link

Fix javadoc for @PluralCount annotation #8704

Open dankurka opened 9 years ago

dankurka commented 9 years ago

Originally reported on Google Code with ID 8739

According to the JavaDoc for @PluralCount [1], it should only be applied to a short
or int parameter.

According to the user guide [2], it can also be applied to a list. However, this seems
not to work reliably and it's difficult to diagnose issues, possibly because of [3].
I've had multiple people ask for help with issues that seem to be related to plural
counts.

It seems like we should decide what @PluralCount does, fix the doc, and make it a compile
error to use it the wrong way.

[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/i18n/client/Messages.PluralCount.html
[2] http://www.gwtproject.org/doc/latest/DevGuideI18nPluralForms.html
[3] https://code.google.com/p/google-web-toolkit/issues/detail?id=5979

Reported by skybrian@google.com on 2014-05-28 20:03:58

dankurka commented 9 years ago
Can you give an example for why it doesn't work reliably when applied to a list or array?
 In either case, it takes the length of the list/array as the value used for choosing
which form to use.

It already is a compile error to use it on an incorrect type:

      if (!isList && !isArray && (primType == null
          || (primType != JPrimitiveType.INT
              && primType != JPrimitiveType.SHORT))) {
        throw error(logger, method.getName()
            + ": PluralCount parameter must be int, short, array, or List");
      }

Reported by jat@jaet.org on 2014-06-05 16:35:09

dankurka commented 9 years ago
Okay, the javadoc on the annotation should be updated then.

Unfortunately I don't have a specific example; just relaying gripes, and we were confused
by the javadoc. It may have something to do with internal tools not working well with
@PluralCount on a list, or confusion due to picking up the wrong translations.

(Do we need any more checking that the translations match the source code? What happens
if they're out of sync?)

Reported by skybrian@google.com on 2014-06-06 18:56:30

dankurka commented 9 years ago
You get warning messages if a plural form is missing or if one is supplied but not used
in the locale.  However, this is an issue internally, since the transconsole approach
is to always use =1, and for languages where that overlaps with the ONE plural form
(ie, English), you leave out ONE.  You could theoretically detect that the various
=N values completely covers a given plural form and not warn about it, but it isn't
trivial.

Reported by jat@jaet.org on 2014-06-06 19:39:49

dankurka commented 9 years ago
I didn't quite catch that. How are =1 and ONE related?

Reported by skybrian@google.com on 2014-06-06 20:19:36

dankurka commented 9 years ago
Inside Google, translations are supposed to always use =1 (at least when I was there),
which means when the count is exactly 1.  In some languages, such as English, the plural
form ONE corresponds to exactly the same case, so it is redundant.

Thus, in Google translations you will have =1 and OTHER for English and similar locales,
and GWT complains that plural form ONE was missing. 

I don't recall the rationale for requiring =1 - ask Mark Davis if you want the details.

Reported by jat@jaet.org on 2014-06-06 21:16:10

dankurka commented 9 years ago
"one" is a plural form, and "1" is the value 1.

http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Important-Notes says:

This is worth emphasizing: A common mistake is to think that "one" is only for only
the number 1. Instead, "one" is a category for any number that behaves like 1. So in
some languages, for example, one → numbers that end in "1" (like 1, 21, 151) but that
don't end in 11 (like "11, 111, 10311).

Reported by tek@google.com on 2014-09-05 23:22:27

dankurka commented 9 years ago
@tek - is this in response to my #5 comment?

Reported by jat@jaet.org on 2014-09-05 23:34:01

dankurka commented 9 years ago
sorry, typo above, should be: "one" is a plural form, and "=1" is the value 1.

One rational for requiring "=1" and "other" for all messages is:
* English has 2 plural forms: "one" and "other".
* English messages in code should have 2 plural forms: "a book" and "N books".
* {one{a book} other{N books}} is wrong in languages where "one" is not just the value
1.
* {=1{a book} other{N books}} is correct in all languages.

Reported by tek@google.com on 2014-09-05 23:46:22

dankurka commented 9 years ago
@jat - I was responding to the question in  #4 "How are =1 and ONE related?".

Reported by tek@google.com on 2014-09-05 23:52:08

dankurka commented 9 years ago
Now I'm confused by what "other" means.

Reported by skybrian@google.com on 2014-09-05 23:55:40

dankurka commented 9 years ago
All the labels OTHER, ONE, TWO, FEW, etc are just arbitrary labels.  ICU assigns them
to particular plural forms (usually with some similar meaning, but not always such
as in the case of Welsh).  OTHER is the one that is used when none of the other forms
apply.  In English, there are two forms:  n=1, and OTHER.  In French, ONE is assigned
to mean n in 0,1 and OTHER is everything else.

Totally separate from the plural forms defined like that are rules that match specific
values.  For example, even though it is perfectly correct to say "I have 0 dogs", it
is generally better to say "I have no dogs".  So, a translation might well have a special
case for 0 even if there is no plural form for it, such as in English.  Likewise, you
might say "I have a dog" or "I have no dogs" in French, even though you could use the
0/1 form for both.

If you have a language which has a plural form matching exactly 1, then =1 and ONE
are identical.  Other languages, say French, they are not -- ONE will match either
0 or 1, while =0 and =1 will match only those specific values.  I assume the Google
internal policy of always specifying =1 is for consistency.

Reported by jat@jaet.org on 2014-09-06 00:44:27

dankurka commented 9 years ago
Here is a CLDR plural rules chart for 193 languages:

http://www.unicode.org/repos/cldr-aux/charts/24/supplemental/language_plural_rules.html

The CLDR uses categories: zero, one, two, few, many, other.

(tl; dr): Nice summary John. You even used the same example language as the CLDR Plural
Rules Important Notes section (link in #6), which says:

These categories are only mnemonics -- the names don't necessarily imply the exact
contents of the category. For example, for both English and French the number 1 has
the category one (singular). In English, every other number has a plural form, and
is given the category other. French is similar, except that the number 0 also has the
category one and not other or zero, because the form of units qualified by 0 is also
singular.

Reported by tek@google.com on 2014-09-06 01:55:16

dankurka commented 9 years ago
Note, to make it (hopefully) clearer on the example about French: in French, you'd say
"j'ai 0 chien", "j'ai 1 chien", and "j'ai 2 chiens".
Note how "chien" doesn't take a final "s" in the =0 form; this is what the ONE form
matches:

{one="j'ai {0} chien", other="j'ai {0} chiens"}

…though you'd probably use specialized forms for =0 and =1 (and leave out the "one"
form):

{"=0"="je n'ai aucun chien", "=1"="j'ai un chien", other="j'ai {0} chiens"}

Reported by t.broyer on 2014-09-08 14:36:23

dankurka commented 9 years ago
It sounds like in French, using "=1" and "OTHER" would be wrong because there is no
mapping for the =0 case? Either =0 should be added or "ONE" should be used instead
of "=1". Instead, "OTHER" is used when no other message is available, which is actually
incorrect in French.

So it sounds like tags like "ONE" have no language-independent meaning. An application
shouldn't have to worry about these things except in the default (English) messages
supplied in annotations.

Perhaps the main issue for confusion is that the i18n is complaining about errors without
specifying a file and line number in the message file, so the programmers tracking
this down don't know where to look to see that there is an error. The method name is
not enough to locate the problem in a complicated build.

Reported by skybrian@google.com on 2014-09-08 16:47:44

dankurka commented 9 years ago
A Googler pointed me to an internal bug report from last year:

According to external docs https://developers.google.com/web-toolkit/doc/latest/DevGuideI18nPluralForms#Lists
this code should work:

 @Description("Text explaining that data for certain metrics is only available starting
at "
     + "a specific date.")
 @DefaultMessage("Data for {0,list,string} is not available before {1}.")
 @AlternateMessage({
     "=0", "unused plural form",
     "=1", "Data for {0,list,string} is not available before {1}."
 })
 String metricsDataNotAvailableBefore(
     @Example("\"Estimated minutes watched\" and \"Average view duration\"")
     @PluralCount List<String> metricNames,
     @Example("June 6, 2012") String startDate);

but in fact it's converted to the following TC string:
[PLURAL_METRIC_NAMES][=0]unused plural form[=1]Data for # is not available before START_DATE.[ZERO]Data
for # is not available before START_DATE.[ONE]Data for # is not available before START_DATE.[TWO]Data
for # is not available before START_DATE.[FEW]Data for # is not available before START_DATE.[MANY]Data
for # is not available before START_DATE.[OTHER]Data for # is not available before
START_DATE.[END_PLURAL]

And when being converted back to gwt .properties file looks like this:
8908027006650726307[\=0] = unused plural form
8908027006650726307[\=1] = Data for {\#} is not available before {1}.
8908027006650726307 = Data for {\#} is not available before {1}.

Now, {#} and {0,list,strings} are quite different. And using {#} instead of {0,list}
causes the translated string to look like "Data for 2 is not available before September
1, 2012." instead of "Data for "Average view duration" and "Estimated minutes watched"
is not available before September 1, 2012". It puts count of the list instead of actual
elements.

So, apparently GWT supports lists arguments, but our internal tools don't really support
it? No wonder there is confusion.

Reported by skybrian@google.com on 2014-09-08 18:33:56

dankurka commented 9 years ago
Correct, transconsole special-cases # for the plural count - to support lists, the gwt->tc
export tool would need to encode the list as a placeholder and back again.  However,
I don't know if tc will handle plural forms without a count in the message.

Reported by jat@jaet.org on 2014-09-08 18:42:31