Make il8n tools available from `python -m`

80ec8472-4906-4627-bd2e-14667fa8ed0c commented 5 years ago

BPO	36837
Nosy	@warsaw, @abadger, @bbkane

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = [] title = 'Make il8n tools available from `python -m`' updated_at = user = 'https://github.com/bbkane' ``` bugs.python.org fields: ```python activity = actor = 'a.badger' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'bbkane' dependencies = [] files = [] hgrepos = [] issue_num = 36837 keywords = [] message_count = 5.0 messages = ['341771', '341876', '341999', '342005', '342198'] nosy_count = 3.0 nosy_names = ['barry', 'a.badger', 'bbkane'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue36837' versions = [] ```

80ec8472-4906-4627-bd2e-14667fa8ed0c commented 5 years ago

Localizing a Python application involves using the gettext standard library module to read .mo files. There are three scripts to assist with this in https://github.com/bbkane/cpython/tree/master/Tools/i18n :

makelocalealias.py : Convert the X11 locale.alias file into a mapping dictionary suitable for locale.py.
msgfmt.py : Generate binary message catalog from textual translation description
pygettext.py : Generate .pot files identical to what GNU xgettext[2] generates for C and C++ code (these can be translated by msgfmt.py)

I recently wrote a tutorial to localize a Python Script ( https://github.com/bbkane/arcade/blob/bbkane/add_localization_example/doc/examples/text_loc_example.rst ) and I had to tell my users (a student audience) to download these scripts from GitHub. I would have been much happier to ask them to use a builtin Python tool available from the -m switch (similar to python -m json.tool), so this issue is to add that.

The docs ( https://docs.python.org/3/library/gettext.html#internationalizing-your-programs-and-modules ) mention these scripts, but do not provide any information on how to get them.

Possible solutions:

turn gettext.py into a package and put these scripts into a tool subpackage (similar to json.tool)
Add a separate package (il8n perhaps) and put these scripts into there
Add links to these scripts and instructions to use them in the docs.

warsaw commented 5 years ago

One other suggestion: put the bulk of Tools/i18n/pygettext.py into Lib/_pygettext.py, then import its main() in both Lib/gettext.py and Tools/i18n/pygettext.py. Then just call that main().

526a9556-0b1e-466b-8760-d36ea509066e commented 5 years ago

Note, I've been doing some tests of how our gettext module differs from GNU gettext and run into a few bugs and lack of features which make msgfmt unusable and limit pygettext's usefulness.

msgfmt doesn't seem to store the charset from the .po file into the .mo file. I think this might have been okay for the lgettext() and gettext() methods under Python2 as those probably passed the byte strings from the .mo files through verbatim. Under Python3, however, we have to decode the byte strings to text and we can't do that without knowing the charset. This leads to a UnicodeDecodeError on any .mo file which contains non-ascii characters (which is going to be the majority of them)
So far, I have found that pygettext doesn't understand how to extract strings from ngettext(). This means that your code can't use plural forms if you want to use pygettext to extract the strings.

These deficiencies are probably things that need to be fixed if we're going to continue to promote these tools in the documentation.

526a9556-0b1e-466b-8760-d36ea509066e commented 5 years ago

A note about the msgfmt problem. It looks like GNU gettext's msgfmt has a similar problem but the msgfmt from pybabel does not. This may mean that we need to change the gettext *Translation objects to be more tolerant of non-ascii encodings (perhaps defaulting to utf-8 instead of ascii).

526a9556-0b1e-466b-8760-d36ea509066e commented 5 years ago

Scratch what I said in https://bugs.python.org/issue36837?@ok_message=msg%20342005%20created%0Aissue%2036837%20message_count%2C%20messages%20edited%20ok&@template=item#msg342005

GNU msgfmt does extract the charset correctly. (My previous test failed to write any output so it was using the .mo file I had written out with msgfmt.py. I realized that this morning when I figured out why my C test program wasn't finding any message catalog.

For reference the three ways to extract strings with the three tools are:

pygettext.py test.py
pybabel extract -o messages.pot test.py
xgettext test.py -o messages.pot test.py

and the three ways to generate catalogs via the three tools are:

msgfmt3.7.py es_MX/LC_MESSAGES/domain.po
msgfmt es_MX/LC_MESSAGES/testc.po -o es_MX/LC_MESSAGES/testc.mo
pybabel compile -D test -d . [--use-fuzzy]

python / cpython

Make il8n tools available from `python -m` #81018