pallets / click

Python composable command line interface toolkit
https://click.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
15.72k stars 1.4k forks source link

Better i18n with gettext: use class-based API #2706

Open carmenbianca opened 6 months ago

carmenbianca commented 6 months ago

Hi lovely Click maintainers,

Currently, Click implements gettext using the classic GNU gettext API. That looks like this:

from gettext import _

print(_("Hello, world!"))

This API depends on a global state in the gettext module. By calling gettext.textdomain(), the active translation domain is changed for all Python modules that use the classic GNU gettext API.

This side effect is usually desirable, except when your module is imported by another module as a library. So you usually don't want to call gettext.textdomain() without putting it behind some sort of function call. With argparse, this is easy: put it in your main function before you even create the ArgumentParser object. With Click, I'm not sure this is possible:

So you end up having to call gettext.textdomain() on import of your module containing your Click groups/commands.

We can fix that by switching to the class-based API. Because Click will still need to support the old API as well for backwards compatibility, my proposal looks a little as follows. Create a module click.i18n with the following contents (simplified):

import gettext as _gettext_module

TRANSLATIONS = None

def gettext(message):
    if TRANSLATIONS is None:
        return _gettext_module.gettext(message)
    return TRANSLATIONS.gettext(message)

# alias
_ = gettext

Now, elsewhere in Click, you replace all from gettext import _ with from .i18n import _.

Subsequently, we can create a function install_translations(translations) in i18n.py that replaces the TRANSLATIONS global constant with an instantiated GNUTranslations object. This function would still need to be called before the consumer's main function, but it wouldn't change the gettext global state—it would only change Click's. Which, as far perfectionism goes, is probably tolerable. It would be better still if there was a pre-hook, but this is fine.

Furthermore, the consumer could use different domains for Click's TRANSLATIONS object and their own, allowing them to separate their own translations from Click's, and hypothetically reuse the Click translations in other projects.

In fact, having done this plumbing, Click could even ship its own translation strings, getting rid of duplication efforts of translating the same Click strings. Click's own translations could then be activated using e.g. install_click_translations() without any arguments.

In summary, the problems solved by this:

I am not aware of other ways to achieve the above that do not require changes to Click. Adding a pre-hook to groups/commands might partially address the problem.

I am willing to make a PR if this issue is validated.


I wrote a blog post here that provides more context on how I use gettext + Click (+ some other components). It has more context than is necessary to understand this issue.

carmenbianca commented 1 week ago

Hi click maintainers, I am still ready to help with this issue.

davidism commented 1 week ago

I'm having some trouble following this, although I think the general idea is "use new gettext local provider instead of global provider"? Is using a "library global" TRANSLATIONS variable and falling back to "gettext global" if it's not set a standard pattern for translations?

I am not aware of other ways to achieve the above that do not require changes to Click.

If we changed Click in some way, would that make the implementation easier or better? I'm open to hearing what changes might be needed.

carmenbianca commented 1 week ago

Hi @davidism ! I will explain the full context. It's a long answer; summary and answers to your questions at the end.

The classic GNU gettext API depends on a global state in the Python gettext library. If you call gettext.gettext("Hello, world!") (equal to _("Hello, world!")), then gettext has no idea where to get a translation for that string. So before you ever run gettext.gettext() in code, you have to register where to find the translations for your string with the library. You do this by running this snippet (slightly simplified, but entirely correct):

# The translations located at 'path/to/translations' now have the domain
# (read: alias) 'your_module'.
gettext.bindtextdomain("your_module", "path/to/translations")
# Activate 'your_module' as the currently used domain. Henceforth, when
# `gettext.gettext()` is called, it tries to find the translation in
# this domain. It knows which language to use from the user's ENV.
gettext.textdomain("your_module")

(As an aside: You can have multiple domains sourced from different paths, BUT you have to make very sure to constantly call gettext.textdomain() to switch context at the right times.)

Now for Click in particular, the tricky bit is to call gettext.textdomain() at the right time. Important context is that I have included all Click strings and translations in my 'path/to/translations'.

So let's say I have this code:

# Can't wrap docstrings in `_()`, so do this here.
_HELP = _("...")

@click.group(name="your_module", help=_HELP)
def main():
    gettext.bindtextdomain("your_module", "path/to/translations")
    gettext.textdomain("your_module")

If I now run your_module --help, three things (don't) happen:

So we are forced to move the gettext.textdomain() call before all of that. This is fine, kind of, but also unfortunate. This now means that importing the module which contains the main function changes the global state of the gettext module. We could imagine a scenario where someone imports your_module after doing their own gettext.textdomain() stuff, but now their gettext global state is all wrong.

If we keep the classic API, then the following pseudocode might help to alleviate those problems:

def setup_gettext():
    gettext.bindtextdomain("your_module", "path/to/translations")
    gettext.textdomain("your_module")

@click.group(
    name="your_module",
    # We assume that evaluating this lambda is delayed until AFTER
    # the prehook is run.
    help=lambda: _("..."),
    prehook=setup_gettext,
)
def main():
    pass

Here, prehook is run before everything else in Click. This means that the Click strings will be correctly translated, and if we correctly jig help to allow a callable, our help string will also be correctly translated.

Implementing this is more effort than the alternative, though.

The class-based Python gettext API does not store any global state. Instead, all of the necessary state is placed in a GNUTranslations object. This looks like this:

# Put the state in the object. The "your_module" string is a bit superfluous
# here, but apparently it is needed.
TRANSLATIONS: GNUTranslations = gettext.translation("your_module", "path/to/translations")

# Instead of globally activating "your_module" as the gettext domain, just
# ask the object to translate stuff.
print(TRANSLATIONS.gettext("Hello, world!"))

Now obviously, this GNUTranslations object needs to be instantiated somewhere. If we instantiate it in the click library itself, then we have a problem: which directory does Click get its translation strings from? There are no translations shipped with Click. And also, users of the Click library already have their own translations of the Click strings that they probably want to use. And also, users already use the gettext.textdomain() call, which wouldn't work if Click switched wholesale to the class-based API.

To keep compatibility, and to offload the need to translate strings downstream, I proposed the following code in click.i18n:

import gettext as _gettext_module

TRANSLATIONS: _gettext_module.GNUTranslations | None = None

def gettext(message):
    if TRANSLATIONS is None:
        return _gettext_module.gettext(message)
    return TRANSLATIONS.gettext(message)

# alias
_ = gettext

If the rest of the Click library then does from .i18n import _ instead of from gettext import gettext as _, the following happens:

So using the prior example, that looks like this:

click.i18n.TRANSLATIONS = gettext.translation("click", "path/to/click/translations")
MY_TRANSLATIONS = gettext.translation("your_module", "path/to/my/translations")

_HELP = MY_TRANSLATIONS.gettext("...")

@click.group(name="your_module", help=_HELP)
def main():
    pass

Click gets its translations from its own object, your_module has its own separate translations, and everything is great and Just Works.

Manually setting an object to click.i18n.TRANSLATIONS isn't super amazing, though, so you could envision creating a convenience function click.i18n.install_translations(obj: GNUTranslations) that does this for you. TRANSLATIONS could then become a private global 'constant'.

In fact, once this is set up, Click could even begin shipping its own translations, to reduce the duplicated efforts downstream. Because the API is class-based, you don't constantly have to call gettext.textdomain() to swap between active domains. This might look a little like this in click.i18n:

_TRANSLATIONS: GNUTranslations | None = None

def install_translations(translations: GNUTranslations | None) -> GNUTranslations:
    if translations is None:
        translations = gettext.translation(
            "click",
            # resolves to `click/locale`, wherever `click` is installed.
            # There would need to be valid translations in this directory,
            # obviously.
            os.path.join(os.path.dirname(__file__), "locale"),
        )
    _TRANSLATIONS = translations
    return translations

In summary:


To answer your questions precisely:

I think the general idea is "use new gettext local provider instead of global provider"?

Both, for backwards-compatibility reasons. The global provider is the fallback if nothing is done by the user, which matches the status quo.

Is using a "library global" TRANSLATIONS variable and falling back to "gettext global" if it's not set a standard pattern for translations?

No. Common practice is this:

Click is unique here because backwards compatibility is desirable (I think; maybe I'm mistaken), and because downstream may want to ship their own translations.

If we changed Click in some way, would that make the implementation easier or better?

I wrote about prehook above. But the prehook workaround is not needed if the class-based API is used by Click.


I hope this helps! Thanks for your maintainer work.