python-babel / babel

The official repository for Babel, the Python Internationalization Library
http://babel.pocoo.org/
BSD 3-Clause "New" or "Revised" License
1.34k stars 448 forks source link

JSX/regexp ambiguity in jslexer #640

Open remram44 opened 5 years ago

remram44 commented 5 years ago

After a regular expression like />/, Babel ignores all messages until the next one.

Mapping:

[javascript: **.js]
encoding = utf-8
extract_messages = gettext

Source file:

alert(gettext("First message"));
var expr = />/g;
alert(gettext("Second message"));
var expr2 = /'/g;
alert(gettext("Third message"));

Result POT:

"Generated-By: Babel 2.6.0\n"

#: test.js:1
msgid "First message"
msgstr ""

#: test.js:5
msgid "Third message"
msgstr ""
akx commented 5 years ago

The JavaScript lexer in Babel is far from perfect, unfortunately... If you feel like trying to hack at the lexer to fix this, by all means please do!

It might be a good idea for a future version to use e.g. https://github.com/Kronuz/esprima-python when it's available (i.e. an optional dependency) for more robust parsing.

remram44 commented 5 years ago

Thanks, I might take a look. I am low on time unfortunately right now.

For reference, lexer is jslexer.py

remram44 commented 5 years ago

Seems that /> gets recognized as a JSX tag.

remram44 commented 5 years ago

I don't see how to fix this. The tokenizer can't possibly tell if /> is the start of a regexp or the end of a JSX tag, they are the same tokens. The Javascript grammar is too ambiguous for this to be decided at the tokenizer level.

A parser would be able to tell if it's currently in a JSX tag or not and resolve the ambiguity, but there is no such parsing being done in Babel. Simply keeping track of JSX open/close tokens wouldn't be enough because regexps can appear inside JSX tags (<hr class={ /\w+/.exec(var)[0] }/>).

I think I'll have to leave that alone, sorry!

Wrote a test: ```python def test_regex(): assert list(jslexer.tokenize(''' re = />/g; ''')) == [ ('name', 're', 2), ('operator', '=', 2), ('regexp', '/>/', 2), ('operator', ';', 2), ] ```