Open 80ec8472-4906-4627-bd2e-14667fa8ed0c opened 5 years ago
Localizing a Python application involves using the gettext
standard library module to read .mo files. There are three scripts to assist with this in https://github.com/bbkane/cpython/tree/master/Tools/i18n :
I recently wrote a tutorial to localize a Python Script ( https://github.com/bbkane/arcade/blob/bbkane/add_localization_example/doc/examples/text_loc_example.rst ) and I had to tell my users (a student audience) to download these scripts from GitHub. I would have been much happier to ask them to use a builtin Python tool available from the -m
switch (similar to python -m json.tool
), so this issue is to add that.
The docs ( https://docs.python.org/3/library/gettext.html#internationalizing-your-programs-and-modules ) mention these scripts, but do not provide any information on how to get them.
Possible solutions:
One other suggestion: put the bulk of Tools/i18n/pygettext.py into Lib/_pygettext.py, then import its main() in both Lib/gettext.py and Tools/i18n/pygettext.py. Then just call that main().
Note, I've been doing some tests of how our gettext module differs from GNU gettext and run into a few bugs and lack of features which make msgfmt unusable and limit pygettext's usefulness.
msgfmt doesn't seem to store the charset from the .po file into the .mo file. I think this might have been okay for the lgettext() and gettext() methods under Python2 as those probably passed the byte strings from the .mo files through verbatim. Under Python3, however, we have to decode the byte strings to text and we can't do that without knowing the charset. This leads to a UnicodeDecodeError on any .mo file which contains non-ascii characters (which is going to be the majority of them)
So far, I have found that pygettext doesn't understand how to extract strings from ngettext(). This means that your code can't use plural forms if you want to use pygettext to extract the strings.
These deficiencies are probably things that need to be fixed if we're going to continue to promote these tools in the documentation.
A note about the msgfmt problem. It looks like GNU gettext's msgfmt has a similar problem but the msgfmt from pybabel does not. This may mean that we need to change the gettext *Translation objects to be more tolerant of non-ascii encodings (perhaps defaulting to utf-8 instead of ascii).
Scratch what I said in https://bugs.python.org/issue36837?@ok_message=msg%20342005%20created%0Aissue%2036837%20message_count%2C%20messages%20edited%20ok&@template=item#msg342005
GNU msgfmt does extract the charset correctly. (My previous test failed to write any output so it was using the .mo file I had written out with msgfmt.py. I realized that this morning when I figured out why my C test program wasn't finding any message catalog.
For reference the three ways to extract strings with the three tools are:
and the three ways to generate catalogs via the three tools are:
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = []
title = 'Make il8n tools available from `python -m`'
updated_at =
user = 'https://github.com/bbkane'
```
bugs.python.org fields:
```python
activity =
actor = 'a.badger'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation =
creator = 'bbkane'
dependencies = []
files = []
hgrepos = []
issue_num = 36837
keywords = []
message_count = 5.0
messages = ['341771', '341876', '341999', '342005', '342198']
nosy_count = 3.0
nosy_names = ['barry', 'a.badger', 'bbkane']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue36837'
versions = []
```