php-gettext / Gettext

PHP library to collect and manipulate gettext (.po, .mo, .php, .json, etc)
MIT License
687 stars 134 forks source link

The language "kg" is not valid #206

Closed paulo-jay closed 5 years ago

paulo-jay commented 5 years ago

Hi guys,,

First of all thanks for this amazing lib that works like a charm !

I've encountered an issue when using "kg" language code.

An exception is thrown :

Uncaught InvalidArgumentException: The language "kg_CG" is not valid

file src/Translations.php on line 360

This language code is actually valid : https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kg

The same issue was encountered with "lu" language code (https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lu)

It has something to do with this method \Gettext\Languages\CldrData::getLanguageInfo (missing plural form ?)

Maybe are you missing some data ?

Thanks in advance

oscarotero commented 5 years ago

Hi. This package depends on gettext/languages to get the available languages and plural forms, and that package is maintained by @mlocati here: https://github.com/mlocati/cldr-to-gettext-plural-rules

mlocati commented 5 years ago

@it-dog This (great) gettext library handles plural forms by using cldr-to-gettext-plural-rules. cldr-to-gettext-plural-rules in turns does not define its own list of plural rules, but it uses the Unicode CLDR data set. Unfortunately, CLDR does not define the plural rules for the kg and lu languages (see here for the original XML data, or here for a JSON version of it).

I think the only solution would be:

  1. find a reliable source of the language rules to define plurals for both languages
  2. contact the CLDR team
  3. wait for the next CLDR version
  4. wait that I create a new version of the cldr-to-gettext-plural-rules library
paulo-jay commented 5 years ago

Thanks for your support (and sorry for my lack of knowledge regarding the internals of your lib).

Still, this is somehow a strange behavior of the gettext/languages to return no data at all because it is only missing some plural form information. You still have valuable data but you block any potential usage of it.

In the gettext/gettext lib, if my understanding is right, you use this information to set the Translations header Plural-Forms.

I suggest, as a default behavior, when no plural form could be found, to set a "default" plural form.

The gettext documentation (https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html) says :

Some languages only require one single form. There is no distinction between the singular and plural > form. An appropriate header entry would look like this:

Plural-Forms: nplurals=1; plural=0; Languages with this property include:

Asian family Japanese, Vietnamese, Korean

Tai-Kadai family Thai

I could send you a pull request if you think this is a good idea.

mlocati commented 5 years ago

I'd leave this decision to @oscarotero, but IMHO I'd avoid that solution, since you may result in a .po/.mo file that's not compatible with the actual language plural rules, and this issue may be hidden because the library does not validate the language anymore.

Furthermore, you may want to assume the nplurals=1; plural=0; plural rule, someone else may prefer the English plural rule (nplurals=2; plural=n != 1;).

So, I think that your approach should be explicitly set by developers, that is with some code like this:

try {
    $translations->setLanguage($languageID);
} catch (\InvalidArgumentException $x) (
    $translations->setPluralForms(1, '0');
}
oscarotero commented 5 years ago

Yes, I'm agree with @mlocati You can define the language and plural forms without use setLanguage and skip the language validation:

$translations->setHeader($translations::HEADER_LANGUAGE, 'kg');
$translations->setPluralForms(1, '0');
paulo-jay commented 5 years ago

Thank you for your support. The workaround fixed my issue.