python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.34k stars 448 forks source link

Allow extracting (deeply) nested calls in Python and Javascript #1127

Open dylankiss opened 2 months ago

dylankiss commented 2 months ago

Currently the Python extractor does not support deeply nested gettext calls (deeper than as a direct argument to the top-level gettext call).

e.g.

_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))

The extraction code was refactored quite a bit to simplify the flow and support this use-case.

Currently the Javascript extractor does not support nested gettext calls at all.

The extraction code was refactored a bit to resemble the Python code as much as possible and support this use-case.

Fixes https://github.com/python-babel/babel/issues/1125 (meanwhile also fixes https://github.com/python-babel/babel/issues/1123)

dylankiss commented 2 months ago

During the refactor, the order of extraction was also changed, as you can see in this test: https://github.com/python-babel/babel/pull/1127/files#diff-c74d633b5cd37350f5a10b2697475119ba1db4f541eeccec744f7d79ab99d6c1R437-R452

It is now the same as the extraction order of xgettext. Also the comments extraction was fixed to be the same as xgettext and apply to all gettext calls (also nested ones) on the same line. Nested translator comment with nested gettext calls are also supported now, just like with xgettext.

e.g.

# NOTE: Main Comment
_("Hello %s",
    # NOTE: Nested Comment
    _("Nested Gettext")
)

Both terms would get their right comment extracted.

tomasr8 commented 2 months ago

Not saying this is not worth fixing, but out of curiosity, do nested gettext calls actually come up often? I don't think I've ever come across one..

dylankiss commented 2 months ago

@tomasr8 In our own codebase with lots of developers, people assume it works and it happens from time to time that they add in nested gettext calls. Even the deeply nested ones happen, like this example: https://github.com/odoo/odoo/pull/149921/files#diff-e073b7fa9d45d46ba8d7f011257b0e77e1f87bf47982abc63dd618ff05dddb1aL267-L268 I think it deserves a fix, meanwhile also fixing some other small issues 🤷

dylankiss commented 1 month ago

UPDATE: I added an extra commit to also allow nested calls in the Javascript extractor. If it's better to open a separate PR for that, no problem.

tomasr8 commented 2 weeks ago

I'm not a big fan of the token-based extractor getting even more complex. I'm thinking we might be able to replace the python extractor with a NodeVisitor which would simplify the code and it would also solve all of the issues with nested calls, f-strings etc., once and for all.

tomasr8 commented 2 weeks ago

So I did some investigation and an AST-based extractor cuts down the complexity quite a bit. However, it's about twice as slow compared to the current extractor. @akx Given the slowdown, is this something worth pursuing in your opinion?

dylankiss commented 1 week ago

@tomasr8 @akx Depending on what we want, I can adapt the PR accordingly. I agree it's not the nicest and most robust way of traversing through a code file, but if the performance is degraded that much by using an AST-based extractor it might still be best to continue this way 🤷‍♂️