python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.31k stars 438 forks source link

Languages that have !=2 plural forms have wrong "python-format" checks #570

Open ewjoachim opened 6 years ago

ewjoachim commented 6 years ago

Babel enforces a strict check (as far as I'm aware, it's not trivial to disable it) regarding python-format : any format variable defined in the input must be defined in the output, and more precisely, any format variable defined in the singular translation should be defined in the singular source, and any format variable defined in the plural translation should be defined in the plural source.

It's all well for language that have singular/plural similar to English, but it's not the case for some languages.

See the code below.

https://github.com/python-babel/babel/blob/9e1ec18d7aff94295c65254c21356de37116ca14/babel/messages/checkers.py#L46-L59

With the izip, it is implicitly expected that there will be 2 translations for the 2 values (singular, plural), where there could be either one (japanese, no plural form), or >2 (polish, russian, ... with several plurals), ...

The consequence is that when compiling a message like :

msgid ""
msgstr ""
"Language-Team: Japanese\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.5.3\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Language: ja_JP\n"

#, python-format
msgid "Hello"
msgid_plural "Hello %(count)s"
msgstr[0] "こんにちは、%(count)s"

(excuse my poor Japanese skills ><)

with this command:

pybabel compile -D django -d locale -l ja_JP

we get this error: error: locale/ja_JP/LC_MESSAGES/django.po:28: unknown named placeholder 'count'.

As far as I can tell:

  1. The only workaround today would be to sed-remove all #, python-format from blocks that have msgid_plural (or, more broadly, all the python-format tags from the translation files that are known to have issues)
  2. Pybabel should maybe offer a --disable-check flag or something
  3. I think the right check to do would be to make sure that all the format variables used in all the translations of a given message are a subset of of the format variables defined in the msgid and msgid_plural. Anything more that this would be making a strong assumption on the translation, that could prove wrong.
ewjoachim commented 6 years ago

Hmm, it's worse than what I thought.

Even when compiling, the python-format tags is added by matching the regex, not by analyzing the tags, so even if I remove the written tags, I still get the failing check. No workaround :/

ewjoachim commented 6 years ago

Workaround: use {} formatting instead of %s formatting. This is not applicable to django templates, sadly.

akx commented 6 years ago

This is related to the old-school bug #35.

ewjoachim commented 6 years ago

Related, maybe, but in my case, these are perfectly valid python format strings, no misdetection. It's just that what is done on python-format string is currently wrong.