Open carmenbianca opened 6 months ago
Hi click maintainers, I am still ready to help with this issue.
I'm having some trouble following this, although I think the general idea is "use new gettext local provider instead of global provider"? Is using a "library global" TRANSLATIONS
variable and falling back to "gettext global" if it's not set a standard pattern for translations?
I am not aware of other ways to achieve the above that do not require changes to Click.
If we changed Click in some way, would that make the implementation easier or better? I'm open to hearing what changes might be needed.
Hi @davidism ! I will explain the full context. It's a long answer; summary and answers to your questions at the end.
The classic GNU gettext API depends on a global state in the Python gettext
library. If you call gettext.gettext("Hello, world!")
(equal to _("Hello, world!")
), then gettext
has no idea where to get a translation for that string. So before you ever run gettext.gettext()
in code, you have to register where to find the translations for your string with the library. You do this by running this snippet (slightly simplified, but entirely correct):
# The translations located at 'path/to/translations' now have the domain
# (read: alias) 'your_module'.
gettext.bindtextdomain("your_module", "path/to/translations")
# Activate 'your_module' as the currently used domain. Henceforth, when
# `gettext.gettext()` is called, it tries to find the translation in
# this domain. It knows which language to use from the user's ENV.
gettext.textdomain("your_module")
(As an aside: You can have multiple domains sourced from different paths, BUT you have to make very sure to constantly call gettext.textdomain()
to switch context at the right times.)
Now for Click in particular, the tricky bit is to call gettext.textdomain()
at the right time. Important context is that I have included all Click strings and translations in my 'path/to/translations'
.
So let's say I have this code:
# Can't wrap docstrings in `_()`, so do this here.
_HELP = _("...")
@click.group(name="your_module", help=_HELP)
def main():
gettext.bindtextdomain("your_module", "path/to/translations")
gettext.textdomain("your_module")
If I now run your_module --help
, three things (don't) happen:
_HELP
is not translated, because gettext.gettext()
was called BEFORE gettext.textdomain()
.click
's strings such as --help Show this message and exit.
in the output are not translated, because the Click library does its stuff BEFORE running the main
function.main
is even run here.So we are forced to move the gettext.textdomain()
call before all of that. This is fine, kind of, but also unfortunate. This now means that importing the module which contains the main
function changes the global state of the gettext
module. We could imagine a scenario where someone imports your_module
after doing their own gettext.textdomain()
stuff, but now their gettext
global state is all wrong.
If we keep the classic API, then the following pseudocode might help to alleviate those problems:
def setup_gettext():
gettext.bindtextdomain("your_module", "path/to/translations")
gettext.textdomain("your_module")
@click.group(
name="your_module",
# We assume that evaluating this lambda is delayed until AFTER
# the prehook is run.
help=lambda: _("..."),
prehook=setup_gettext,
)
def main():
pass
Here, prehook
is run before everything else in Click. This means that the Click strings will be correctly translated, and if we correctly jig help
to allow a callable, our help string will also be correctly translated.
Implementing this is more effort than the alternative, though.
The class-based Python gettext API does not store any global state. Instead, all of the necessary state is placed in a GNUTranslations
object. This looks like this:
# Put the state in the object. The "your_module" string is a bit superfluous
# here, but apparently it is needed.
TRANSLATIONS: GNUTranslations = gettext.translation("your_module", "path/to/translations")
# Instead of globally activating "your_module" as the gettext domain, just
# ask the object to translate stuff.
print(TRANSLATIONS.gettext("Hello, world!"))
Now obviously, this GNUTranslations
object needs to be instantiated somewhere. If we instantiate it in the click
library itself, then we have a problem: which directory does Click get its translation strings from? There are no translations shipped with Click. And also, users of the Click library already have their own translations of the Click strings that they probably want to use. And also, users already use the gettext.textdomain()
call, which wouldn't work if Click switched wholesale to the class-based API.
To keep compatibility, and to offload the need to translate strings downstream, I proposed the following code in click.i18n
:
import gettext as _gettext_module
TRANSLATIONS: _gettext_module.GNUTranslations | None = None
def gettext(message):
if TRANSLATIONS is None:
return _gettext_module.gettext(message)
return TRANSLATIONS.gettext(message)
# alias
_ = gettext
If the rest of the Click library then does from .i18n import _
instead of from gettext import gettext as _
, the following happens:
gettext.textdomain()
to get any use out of it.click.i18n.TRANSLATIONS
with some object, then the classic GNU gettext API is ignored, and all translations are sourced from that object.So using the prior example, that looks like this:
click.i18n.TRANSLATIONS = gettext.translation("click", "path/to/click/translations")
MY_TRANSLATIONS = gettext.translation("your_module", "path/to/my/translations")
_HELP = MY_TRANSLATIONS.gettext("...")
@click.group(name="your_module", help=_HELP)
def main():
pass
Click gets its translations from its own object, your_module
has its own separate translations, and everything is great and Just Works.
Manually setting an object to click.i18n.TRANSLATIONS
isn't super amazing, though, so you could envision creating a convenience function click.i18n.install_translations(obj: GNUTranslations)
that does this for you. TRANSLATIONS
could then become a private global 'constant'.
In fact, once this is set up, Click could even begin shipping its own translations, to reduce the duplicated efforts downstream. Because the API is class-based, you don't constantly have to call gettext.textdomain()
to swap between active domains. This might look a little like this in click.i18n
:
_TRANSLATIONS: GNUTranslations | None = None
def install_translations(translations: GNUTranslations | None) -> GNUTranslations:
if translations is None:
translations = gettext.translation(
"click",
# resolves to `click/locale`, wherever `click` is installed.
# There would need to be valid translations in this directory,
# obviously.
os.path.join(os.path.dirname(__file__), "locale"),
)
_TRANSLATIONS = translations
return translations
In summary:
To answer your questions precisely:
I think the general idea is "use new gettext local provider instead of global provider"?
Both, for backwards-compatibility reasons. The global provider is the fallback if nothing is done by the user, which matches the status quo.
Is using a "library global" TRANSLATIONS variable and falling back to "gettext global" if it's not set a standard pattern for translations?
No. Common practice is this:
click
)Click is unique here because backwards compatibility is desirable (I think; maybe I'm mistaken), and because downstream may want to ship their own translations.
If we changed Click in some way, would that make the implementation easier or better?
I wrote about prehook
above. But the prehook workaround is not needed if the class-based API is used by Click.
I hope this helps! Thanks for your maintainer work.
Hi lovely Click maintainers,
Currently, Click implements gettext using the classic GNU gettext API. That looks like this:
This API depends on a global state in the gettext module. By calling
gettext.textdomain()
, the active translation domain is changed for all Python modules that use the classic GNU gettext API.This side effect is usually desirable, except when your module is imported by another module as a library. So you usually don't want to call
gettext.textdomain()
without putting it behind some sort of function call. With argparse, this is easy: put it in yourmain
function before you even create theArgumentParser
object. With Click, I'm not sure this is possible:--help
).So you end up having to call
gettext.textdomain()
on import of your module containing your Click groups/commands.We can fix that by switching to the class-based API. Because Click will still need to support the old API as well for backwards compatibility, my proposal looks a little as follows. Create a module
click.i18n
with the following contents (simplified):Now, elsewhere in Click, you replace all
from gettext import _
withfrom .i18n import _
.Subsequently, we can create a function
install_translations(translations)
ini18n.py
that replaces theTRANSLATIONS
global constant with an instantiatedGNUTranslations
object. This function would still need to be called before the consumer'smain
function, but it wouldn't change the gettext global state—it would only change Click's. Which, as far perfectionism goes, is probably tolerable. It would be better still if there was a pre-hook, but this is fine.Furthermore, the consumer could use different domains for Click's
TRANSLATIONS
object and their own, allowing them to separate their own translations from Click's, and hypothetically reuse the Click translations in other projects.In fact, having done this plumbing, Click could even ship its own translation strings, getting rid of duplication efforts of translating the same Click strings. Click's own translations could then be activated using e.g.
install_click_translations()
without any arguments.In summary, the problems solved by this:
I am not aware of other ways to achieve the above that do not require changes to Click. Adding a pre-hook to groups/commands might partially address the problem.
I am willing to make a PR if this issue is validated.
I wrote a blog post here that provides more context on how I use gettext + Click (+ some other components). It has more context than is necessary to understand this issue.