wichert / lingua

Translation toolkit for Python
Other
45 stars 32 forks source link

BabelExtractor is inherently broken if function not in keywords #86

Closed lyrixderaven closed 4 years ago

lyrixderaven commented 7 years ago

We recently upgraded to lingua 4.10 and stumbled upon the following stacktrace when running bin/pot-create with our custom extractor (which processes a csv file of our own format):

Traceback (most recent call last):
  File "bin/pot-create", line 9, in <module>
    load_entry_point('lingua==4.10', 'console_scripts', 'pot-create')()
  File "[...]/lib/python2.7/site-packages/lingua/extract.py", line 330, in main
    for message in extractor(real_filename, options):
  File "[...]/lib/python2.7/site-packages/lingua/extractors/babel.py", line 45, in __call__
    check_c_format(msgid, flags)
  File "[...]/lib/python2.7/site-packages/lingua/extractors/__init__.py", line 42, in check_c_format
    formats = list(re.finditer('%(?!%)', buf))
  File "[...]/lib/python2.7/re.py", line 190, in finditer
    return _compile(pattern, flags).finditer(string)
TypeError: expected string or buffer

Essentially, this is the same stacktrace as described in the closed issues https://github.com/wichert/lingua/issues/56.

After some investigation, the problem seems to be the following section in babel.py

if not isinstance(args, (list, tuple)):
    args = [args]
args = [(None, a, lineno) for a in args]
if function in self.keywords:
    (domain, msgctxt, msgid, msgid_plural, c) = parse_keyword(args, self.keywords[function], filename, lineno)
    if c:
        comment.append(c)
else:
    msgid = args[0]
    domain = msgid_plural = None

If there is no function set or the function isn't set in self.keywords, msgid will be set to args[0], which will always be a tuple instead of a string (because of the args = [(None, a, lineno) for a in args] statement before), causing both the check_c_format and the check_python_format function to fail. Even if that else: path wasn't broken, the final yield statement would also fail, given that msgctxt was never assigned.

I'm not involved enough with lingua to be able to really tell what you were trying to do here, but it seems to be quite broken. Any suggestions for adaptations to get it working again?