python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.31k stars 438 forks source link

Format strings not marked correctly #759

Open youtux opened 3 years ago

youtux commented 3 years ago

This is a problem affecting the extraction of string from python files.

While format strings like "Hello %(name)s" are correctly extracted and marked with the #, python-format marker, strings like "Hello {name}" are not marked, and translators may not realize that and change the formatting part, resulting in KeyErrors when actually evaluating the string. xgettext will mark them correctly, with the #, python-brace-format marker.

How to reproduce

# asd.py
_("old style hello %(name)s")

_("new style hello {name}")

Run pybabel extraction:

$ pybabel extract asd.py -o asd.babel.po
extracting messages from asd.py
writing PO template file to asd.po

The asd.babel.po contains these entries:

...
#: asd.py:1
#, python-format
msgid "old style hello %(name)s"
msgstr ""

#: asd.py:3
msgid "new style hello {name}"
msgstr ""

The last entry is not marked as a format string.

Run the following command to check what instead xgettext does:

$ xgettext -L Python --keyword=_ asd.py -o - > asd.xgettext.po

The asd.xgettext.po contains these entries:

#: asd.py:1
#, python-format
msgid "old style hello %(name)s"
msgstr ""

#: asd.py:3
#, python-brace-format
msgid "new style hello {name}"
msgstr ""

Note that the last string is marked as #, python-brace-format.

Use case

I have this problem when automatically translating using podebug to generate catchy unicode strings for development purposes. The tool will not see that "hello {name}" is a format string, and it will translate it to "ħḗŀŀǿ {ƞȧḿḗ}", causing a KeyError when calling _("hello {name}").format(name="Alessio"). This can be a problem when using tools like Transifex that protect you from touching part of the string that you shouldn't. If they are not marked correctly, the tool can't do much.

ronanpaixao commented 2 years ago

Also, even if marking the .po file manually, pybabel compile doesn't handle the #, python-brace-format properly:

(...)
#: somefile.py:1363
#, python-brace-format
msgid "Percentil 95% da Validação 1: {}"
msgstr "95% percentile of Validation 1: {}"

pybabel compile -d . -i translations.po -l en -f --statistics

122 of 122 messages (100%) translated in translations.po
error: translations.po:518: placeholders are incompatible
compiling catalog translations.po to .\en\LC_MESSAGES\translations.mo
1 errors encountered.

Using Anaconda3 64-bit, Python 3.8.13, babel 2.10.1, Windows 10.

While pybabel indicates the error, it also says that all messages were translated. Gotta test it properly though.