python / cpython

The Python programming language
https://www.python.org
Other
62.2k stars 29.89k forks source link

Allow lazy loading of translations in gettext. #79809

Open a5f71923-ff5e-465b-b0a2-651b1e1abbf3 opened 5 years ago

a5f71923-ff5e-465b-b0a2-651b1e1abbf3 commented 5 years ago
BPO 35628
Nosy @s-ball

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-feature', 'library'] title = 'Allow lazy loading of translations in gettext.' updated_at = user = 'https://github.com/s-ball' ``` bugs.python.org fields: ```python activity = actor = 's-ball' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 's-ball' dependencies = [] files = [] hgrepos = [] issue_num = 35628 keywords = [] message_count = 1.0 messages = ['332815'] nosy_count = 1.0 nosy_names = ['s-ball'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue35628' versions = ['Python 3.8'] ```

a5f71923-ff5e-465b-b0a2-651b1e1abbf3 commented 5 years ago

When working on i18n, I realized that msgfmt.py did not generate any hash table. One step further, I realized that the gettext.py would not have used it because it unconditionnaly loads the whole translation files and contains the following TODO message:

TODO:

I have studied the code, and found that it should not be too complex to implement it in pure Python. I have posted a message on python-ideas about it and here are my conclusion:

Features: \======== The gettext module should be allowed to load lazily the catalogs from mo file. This lazy load should be optional and make use of the hash tables from mo files when they are present or revert to a binary search. The translation strings should be cached for better performances.

API changes: \============ 3 functions from the gettext module will have 2 new optional parameter named caching, and keepopen:

gettext.bindtextdomain(domain, localedir=None) would become gettext.bindtextdomain(domain, localedir=None, caching=None, keepopen=False)

gettext.translation(domain, localedir=None, languages=None, class=None, fallback=False, codeset=None) would become gettext.translation(domain, localedir=None, languages=None, class=None, fallback=False, codeset=None, caching=None, keepopen=False)

gettext.install(domain, localedir=None, codeset=None, names=None) would become gettext.install(domain, localedir=None, codeset=None, names=None, caching=None, keepopen=False)

The new caching parameter could receive the following values: caching=None: revert to the previour eager loading of the full catalog. It will be the default to allow previous application to see no change caching=1: lazy loading with unlimited cache caching=n where n is a positive (>=0) integer value: lazy loading with a LRU cache limited to n strings

The keepopen parameter would be a boolean: keepopen=False (default): the mo file is only opened before loading a translation string and closed immediately after - it is also opened once when the GNUTranslation class is initialized to load the file description keepopen=True: the mo file is kept open during the lifetime of the GNUTranslation object. This parameter is ignored and not used if caching is None

Implementation: \============== The current GNUTranslation class loads the content of the mo file to build a dictionnary where the original strings are the keys and the translated keys the values. Plural forms use a special processing: the key is a 2 tuple (singular original string, order), and the value is the corresponding translated string - order=0 is normally for the singular translated string.

The proposed implementation would simply replace this dictionary with a special mapping subclass when caching is not None. That subclass would use same keys as the original directory and would:

That should allow to implement the new feature with minimal refactoring for the gettext module.

But I also propose to change msgfmt.py to build the hashtable. IMHO, the function should lie in the standard library probably as a submodule of gettext to allow various Python projects (pybabel, django) to directly use it instead of developping their own ones.

I will probably submit a PR in a while but it will will require some time to propose a full implementation with a correct test coverage.