tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.8k stars 5.51k forks source link

Tornado Template i18n extract the text waited to translate #2600

Open mywaiting opened 5 years ago

mywaiting commented 5 years ago

Hello:

I use tornado template in my website, and I need to take i18n to my website. This is a sample HTML template in my application

        <div class="page404">
            <h1>{{ _("Page Not Found") }}</h1>
            <div class="markdown">
                <p>{{ _('You requested URL') }} </p>
                <p>{{ _('Thats all we know.') }}</p>
            </div>
        </div>

As we know, _('xxxxx') is the i18n text waited to translate, but, if you used tornado template, You may NOT find any tools to extract those text as message.pot or en_US.cvs to take the translation.

I found pybabel can extract the text fron jinja2 template, but it is NOT suitable to tornado template. As the same, GNU gettext not support HTML template to extract the text.

After I review the source code of pybabel, I found it is simple to integreted to tornado template, What I need is just add some functions in tornado.locale, it is simeple and not much code, it will enhance tornado i18n.

If anyone like this and need the code to extract tornado template HTML file gettext string, I am happy to share how I do it.

And more, if Ben like this, please feel free to leave a comment, I will make a PR ASAP.

ploxiln commented 5 years ago

related: #622

bdarnell commented 5 years ago

Yes, a PR to extract translatable strings into babel would be welcome.

mywaiting commented 5 years ago

Ok, I will do it as soon as possible. Code flow is very simple, but I need make a way to extend it to Tornado code structure Non-intrusive. This may waste a few days to do it.

mywaiting commented 5 years ago

Hello:

I open this comment to explain what I think about the extract translatable strings from Tornado's template. I study some days and I found that is hard to make this to Tornado code structure Non-intrusive.

How can I extract translatable strings from Tornado's template?

It is very simple to make this. As we know, all HTML template used in Python, It MUST translate to Python code, no matter what type template language it used.

so, we can make it simple like this code flow:

    base_dir = os.path.dirname(__file__)
    loader = tornado.template.Loader(base_dir)
    tmplt = loader.load('test.html')
    # print dir(tmplt)
    print tmplt.code

tmplt.code actually is python code. so it is easily to extract translatable strings used GNU xgettext or just simple use pybabel

Actually, I have finish the extract translatable strings part for Tornado. but I have some questions to extend this to Tornado code structure Non-intrusive.

what problem I need some advice

I ask for some advice to help finish this. Here are my difficulty to finish the extract translatable strings for Tornado HTML template.

  1. Tornado do not have a command-line interface like django.manage.py, so I can not do the extract translatable strings in tornado command-line interface.
  2. I can extract translatable strings. But code user must define template_path which define in tornado.web.Applcation. It is hard to redefine it alone.
  3. I wants to do it like this, as follow:

    # tornado.translations.translate(translations_dir, 'translate.csv')
    tornado.translations.translate(translations_dir, 'translate.pot')

tornado.translations.translate() go through translations_dir all file, specific .py, .js, .html file with _("translatable strings") and extract those translatable strings into translate.csv or translate.pot file.Especilly, tornado.translations.translate() need GNU Gettext Utilities. Please make sure install this before translate action make.

This is all my thoughts

Does anyone have some advices about this? @bdarnell @ploxiln . Thanks.

bdarnell commented 5 years ago

Tornado do not have a command-line interface like django.manage.py, so I can not do the extract translatable strings in tornado command-line interface.

Adding a __main__ block to template.py to support a CLI command like python -m tornado.template generate templates/foo.html seems like a good idea.

I can extract translatable strings. But code user must define template_path which define in tornado.web.Applcation. It is hard to redefine it alone.

Oh, good point. We might be able to work around this, though. We need the template path to be able to process include and extend directives, but I don't think that's ever necessary for translation purposes. We just need to process the templates enough to get something syntactically valid that we can pass to a python-parsing message extractor. I think we can simply ignore include directives and when extend and block are used we generate code for all the blocks.

I wants to do it like this, as follow: tornado.translations.translate(translations_dir, 'translate.pot')

To be clear, this is just the extraction step, right? If this step walks over all the .py files, it seems to be trying to reimplement tools like babel. I think it's better to do this as a plugin for babel so that you don't have to change your workflow (babel won't know about tornado's csv format, but that's fine).

mywaiting commented 5 years ago

Thanks Ben.

I also think it is not necessary to process template tag like include and extend. But when I review the source of django manage.py makemessage, I found django actually have process their html code used django.utils.translation.templatize , this do the lexer analytics and tokenize for that conver processing used in GNU xgettext.

The same process as like pybabel. jinja2 make a extension interface in jinja2.ext.babel_extract. It actually have process html code by jinja.Environment.parse()

Tornado template has no Template.Lexer and Template.tokenize, it is just simply to assemble the code together. So, the most simple way is used xxxxx.generated.py that generated by tornado template, that is no need any other lexer or tokenize support. This way is most simple way and no need more code to lexer or tokenize the template html.

It's better to do this as a plugin for babel. That's right. The plugin's core function is how to export tornado template xxxxx.generated.py to let pybebal to extracet the extract translatable strings, or just used GNU xgettext. So the core problem is how to embed this into tornado.web.Application and call the template_path

And more, babel won't know about tornado's csv format, it is true. But it is very simple to get this as follow:

import pandas as pd
import polib

# what pybabel extract file
po = polib.pofile('pybabel-extract-messages.po')

msgstr = []
msgid = []
for entries in po:    
    msgid.append(entries.msgid)
    msgstr.append(entries.msgstr)

data = pd.DataFrame({
    'Chinese': msgid,
    'English': msgstr,
    'Translate Check': pd.np.nan
})
# convert xlsx/csv file for tornado used, waited to translated
data.to_excel('tornado-used-cvs-i18n-file.xlsx')

I will extend this to tornado.locale, if you like this.

Thanks.